Josh Wills created CRUNCH-256:
---------------------------------
Summary: SequentialFileNamingScheme should cache the # of files in
the target directory after the first read
Key: CRUNCH-256
URL: https://issues.apache.org/jira/browse/CRUNCH-256
Project: Crunch
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Josh Wills
Fix For: 0.8.0
After a job finishes running, the post-job hooks rename the files from a temp
output directory to the target output directory. When we have lots of files,
this move can take a long time, and I traced the performance issue to the fact
that SequentialFileNamingScheme does a listStatus() on the output directory for
every file that gets moved. If SequentialFileNamingScheme just does this check
once and then increments an internal counter, we can significantly decrease the
performance overhead involved with the move.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira