[
https://issues.apache.org/jira/browse/UIMA-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682780#comment-13682780
]
Marshall Schor commented on UIMA-2670:
--------------------------------------
This is "example" code (which people often just use, it's true). This flag
(lastSegment) is being used in this sample code to indicate the end of the
collection.
If this were changed, it would potentially break other user's uses of this
which are relying on this bit of data.
A workaround is, of course, to have your own version of this example code where
you change this, etc.
I suppose we could make this a configurable, via a parameter. If you would
like to contribute such a fix, we can put it in. It should, however, for
backwards compatibility, work as it does now if the parameter wasn't specified.
> FileSystemCollectionReader doesn't set lastSegment correctly
> ------------------------------------------------------------
>
> Key: UIMA-2670
> URL: https://issues.apache.org/jira/browse/UIMA-2670
> Project: UIMA
> Issue Type: Bug
> Components: Examples
> Affects Versions: 2.4.0SDK
> Reporter: Jens Grivolla
> Original Estimate: 10m
> Remaining Estimate: 10m
>
> FileSystemCollectionReader only sets lastSegment=true (in the
> SourceDocumentInformation) on the last document. Given that it loads
> individual documents, not segments of a document, this should be "true" for
> each CAS that it generates.
> This is a problem when later using a CAS multiplier to segment the CAS. It
> should be possible to check whether the CAS is a complete document or a
> segment by testing for "offsetInSource==0 && lastSegment==true".
> in org.apache.uima.examples.cpe.FileSystemCollectionReader:166
> srcDocInfo.setLastSegment(mCurrentIndex == mFiles.size());
> should be:
> srcDocInfo.setLastSegment(true);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira