[
https://issues.apache.org/jira/browse/UIMA-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674530#comment-13674530
]
Jens Grivolla commented on UIMA-2670:
-------------------------------------
While the bug is tiny it makes it impossible to correctly handle segments when
using the FileSystemCollectionReader. Segment-aware components (such as CAS
consumers) cannot know if a given CAS represents a complete document.
> FileSystemCollectionReader doesn't set lastSegment correctly
> ------------------------------------------------------------
>
> Key: UIMA-2670
> URL: https://issues.apache.org/jira/browse/UIMA-2670
> Project: UIMA
> Issue Type: Bug
> Components: Examples
> Affects Versions: 2.4.0SDK
> Reporter: Jens Grivolla
> Original Estimate: 10m
> Remaining Estimate: 10m
>
> FileSystemCollectionReader only sets lastSegment=true (in the
> SourceDocumentInformation) on the last document. Given that it loads
> individual documents, not segments of a document, this should be "true" for
> each CAS that it generates.
> This is a problem when later using a CAS multiplier to segment the CAS. It
> should be possible to check whether the CAS is a complete document or a
> segment by testing for "offsetInSource==0 && lastSegment==true".
> in org.apache.uima.examples.cpe.FileSystemCollectionReader:166
> srcDocInfo.setLastSegment(mCurrentIndex == mFiles.size());
> should be:
> srcDocInfo.setLastSegment(true);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira