[ 
https://issues.apache.org/jira/browse/MAHOUT-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751794#comment-13751794
 ] 

Stevo Slavic commented on MAHOUT-1302:
--------------------------------------

I've just committed a change to fix the issue by making order of processing 
mail archives and (sub)directories deterministic and non-OS specific - first 
processing files then nested directories, just as expected by 
{{SequenceFilesFromMailArchivesTest.testSequential}} unit test.

I don't like the design of {{SequenceFilesFromMailArchives}} - it's using 
{{PrefixAdditionFilter}} which is a {{FileFilter}} to traverse the FS tree. 
That felt unnatural before my change, and feels even more unnatural now after 
the change.

                
> SequenceFilesFromMailArchivesTest.testSequential failing
> --------------------------------------------------------
>
>                 Key: MAHOUT-1302
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1302
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.8
>         Environment: ubuntu-3 and ubuntu-6 Apache Jenkins nodes
>            Reporter: Stevo Slavic
>            Assignee: Suneel Marthi
>            Priority: Minor
>              Labels: test
>             Fix For: 0.9
>
>
> SequenceFilesFromMailArchivesTest.testSequential is failing only on ubuntu3 
> and ubuntu6 Jenkins nodes. Because of that, MahoutQuality and integration job 
> builds either fail or are successful depending on where they get run.
> Test fails because it expects entries in chunk-0 SequenceFile to be in 
> specific order, but that order is not guaranteed because of the way the 
> chunk-0 is created/filled - SequenceFilesFromMailArchives traverses input 
> using Java's
> File[] java.io.File.listFiles(FileFilter filter)
> which does not guarantee order of files/directories.
> Unless we want in SequenceFileIterator to guarantee order by sorting, test 
> needs to be changed to verify presence of given files and their content, but 
> not their exact order.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to