[ 
https://issues.apache.org/jira/browse/METRON-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884569#comment-15884569
 ] 

ASF GitHub Bot commented on METRON-743:
---------------------------------------

Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/467
  
    The performance penalties are minimal.  The number of files will equal the 
number of reducers, which does not scale with the data, and user specifiable.  
Also we are just sorting the file handles here, not the contents, so OOM errors 
are very unlikely.  The contents are sorted by virtue of MapReduce, the files 
are named in an ordered way by virtue of our custom partitioner, this just 
ensures that the files are processed in order.
    
    I'm not treating this as just a test problem.  This is a problem of our 
assumptions not being correct.  This could be a problem for the real pcap 
system, not just the test, if people are using non-HDFS implementation.  For 
HDFS, it's probably not an issue (I'm not even sure of that in all cases, 
honestly and there is no guarantee for the behavior to change since it's not 
mandated), but I'd rather own our assumptions rather than depend on Filesystem 
operations which do not conform to our assumptions necessarily.


> Sort the files when reading results from Pcap
> ---------------------------------------------
>
>                 Key: METRON-743
>                 URL: https://issues.apache.org/jira/browse/METRON-743
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Casey Stella
>
> The FileSystem.listFiles() call does not return the files in sorted order, 
> which we assume for all FileSystem implementations.  We should sort this to 
> be certain.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to