[ https://issues.apache.org/jira/browse/OPENNLP-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Wiesner updated OPENNLP-1702: ------------------------------------ Description: With the recent addition of {{BratNameSampleStreamFactoryTest}} via OPENNLP-1695, it became obvious (Eval test run), that the code in BratDocumentStream is prone to non-determinism. This stems from the fact that {{java.util.File#listFiles(..)}} does not guarantee any order of the returned elements. A potential fix for achieving determinism again is to sort the result of listFiles(..) alphabetically in ASC order. was: With the recent addition of {{BratNameSampleStreamFactoryTest}} via OPENNLP-1695, it became obvious (Eval test run), that the code in BratDocumentStream is prone to non-determinism. This stems from the fact that {{java.util.File#listFiles(..)}} does not guarantee any order of the returned elements. A potential fix for achieving determinism again, is to sort the result of listFiles(..) alphabetically in ASC order. > BratDocumentStream should process files in bratCorpusDir deterministically > -------------------------------------------------------------------------- > > Key: OPENNLP-1702 > URL: https://issues.apache.org/jira/browse/OPENNLP-1702 > Project: OpenNLP > Issue Type: Bug > Components: Build, Packaging and Test > Affects Versions: 2.5.3 > Reporter: Martin Wiesner > Assignee: Martin Wiesner > Priority: Minor > Fix For: 2.5.4 > > > With the recent addition of {{BratNameSampleStreamFactoryTest}} via > OPENNLP-1695, it became obvious (Eval test run), that the code in > BratDocumentStream is prone to non-determinism. This stems from the fact that > {{java.util.File#listFiles(..)}} does not guarantee any order of the returned > elements. > A potential fix for achieving determinism again is to sort the result of > listFiles(..) alphabetically in ASC order. -- This message was sent by Atlassian Jira (v8.20.10#820010)