[ 
https://issues.apache.org/jira/browse/UIMA-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800907#comment-13800907
 ] 

Richard Eckart de Castilho commented on UIMA-2670:
--------------------------------------------------

The JavaDoc for SourceDocumentInformation says

{quote}
sets For a CAS that represents a segment of a larger source document, this flag 
indicates whether this CAS is the final segment of the source document.
{quote}

I agree with Jens, that the behavior of FileSystemCollectionReader is broken 
with regards to that definition and should be fixed. I think, we should be 
allowed to fix bugs, even if it may break things for some people. It is example 
code, so the impact should be minimal. In fact, if people rely on that code, 
they probably had it coming anyway.After all, the main purpose of examples 
should be the reflection of the current best practices, which may change. 
Introducing complexity into examples to maintain backwards compatibility 
defeats any desire towards examples being minimal representatives of the 
current best practices.

That said, I also agree with Marshall that people should use their own versions 
of example code instead of relying on the examples we provide.  It might be a 
good idea to explicitly mark such code as "may be subject to changes without 
further notice" as to make people aware that is is not covered by the 
compatibility standards that we apply to other parts of the framework.

> FileSystemCollectionReader doesn't set lastSegment correctly
> ------------------------------------------------------------
>
>                 Key: UIMA-2670
>                 URL: https://issues.apache.org/jira/browse/UIMA-2670
>             Project: UIMA
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 2.4.0SDK
>            Reporter: Jens Grivolla
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> FileSystemCollectionReader only sets lastSegment=true (in the 
> SourceDocumentInformation) on the last document. Given that it loads 
> individual documents, not segments of a document, this should be "true" for 
> each CAS that it generates.
> This is a problem when later using a CAS multiplier to segment the CAS. It 
> should be possible to check whether the CAS is a complete document or a 
> segment by testing for "offsetInSource==0 && lastSegment==true".
> in org.apache.uima.examples.cpe.FileSystemCollectionReader:166
> srcDocInfo.setLastSegment(mCurrentIndex == mFiles.size());
> should be:
> srcDocInfo.setLastSegment(true);



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to