[jira] Issue Comment Edited: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

Vincent Bouscasse (JIRA) Mon, 30 Nov 2009 05:25:51 -0800

    [ 
https://issues.apache.org/jira/browse/IO-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783648#action_12783648
 ]


Vincent Bouscasse edited comment on IO-170 at 11/30/09 1:23 PM:
----------------------------------------------------------------

Hi Matthew. 

I'm not sure i got the point but i think Damian was thinking about an iterator 
that allows for processing results on the fly as soon as they are available. 
The code in your patch does not allow this: we have to wait until the result 
files are found before the first file object can be used as a return of 
iterator.next(). It can be long if we search files in large directory trees.

I've written a recursive Iterator<File> implementation allowing to get the 
first matches as soon as they're discovered. The next match is computed in the 
hasNext() method and it uses linked lists to store matches and subdirectories. 
The complete iteration speed is the same as the actual one (Commons IO 1.4) but 
first results are provided more quickly. This iterator implementation typical 
usage is in a producer thread whereas the file processing is done in a consumer 
thread allowing to speeding up the file processing

I can provide you a code sample if it can match your needs.

Best regards.




      was (Author: vbouscasse):
    Hi Matthew. 

I'm not sure i got the point but i think Damian was thinking about an iterator 
that allows for processing results on the fly as soon as they are available. 
The code in your patch does not allow this: we have to wait until the result 
files are found before the first file object can be used as a return of 
iterator.next(). It can be long if we search files in large directory trees.

I've written a recursive Iterator<File> implementation allowing to get the 
first matches as soon as they're discovered. The next match is computed in the 
hasNext() method and it uses linked lists to store matches and subdirectories. 
The complete iteration speed is the same as the actual one but first results 
are provided more quickly. This iterator implementation typical usage is in a 
producer thread whereas the file processing is done in a consumer thread 
allowing to speeding up the file processing

I can provide you a code sample if it can match your needs.

Best regards.



  
> Scalable Iterator for files, better than FileUtils.iterateFiles
> ---------------------------------------------------------------
>
>                 Key: IO-170
>                 URL: https://issues.apache.org/jira/browse/IO-170
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 1.4
>         Environment: generic file systems
>            Reporter: Damian Noseda
>            Priority: Minor
>             Fix For: 2.x
>
>         Attachments: real_iterators.patch
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Improve the way that iterateFiles generate an iterator. The current way it 
> not scale. It's try to add all files in a list and then return the iterator 
> of that list. A better way it would be create an customize Iterator<File> 
> with a stack of arrays of File to go up and down in the directory tree.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (IO-170) Scalable Iterator for files, better than FileUtils.iterateFiles

Reply via email to