[ 
https://issues.apache.org/jira/browse/CRUNCH-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748719#comment-13748719
 ] 

Gabriel Reid commented on CRUNCH-256:
-------------------------------------

Good point, that sounds like the best way to go. 
                
> SequentialFileNamingScheme should cache the # of files in the target 
> directory after the first read
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-256
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-256
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.8.0
>
>         Attachments: CRUNCH-256.patch
>
>
> After a job finishes running, the post-job hooks rename the files from a temp 
> output directory to the target output directory. When we have lots of files, 
> this move can take a long time, and I traced the performance issue to the 
> fact that SequentialFileNamingScheme does a listStatus() on the output 
> directory for every file that gets moved. If SequentialFileNamingScheme just 
> does this check once and then increments an internal counter, we can 
> significantly decrease the performance overhead involved with the move.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to