Accumulo Output Format needs better fix for empty files (see Accumulo-55)
-------------------------------------------------------------------------
Key: ACCUMULO-146
URL: https://issues.apache.org/jira/browse/ACCUMULO-146
Project: Accumulo
Issue Type: Improvement
Reporter: John Vines
Assignee: John Vines
Priority: Minor
Fix For: 1.5.0
In conjuction with Accumulo-52, large amounts of empty files can cause
problems. The short problem is when a reducer is empty, due to the partitioner
used, the file for it will still be created. We do not want empty files
lingering around, especially do not want them bulk imported. It should be as
simple as either not creating the file until a write on it is attempted (more
complex) or the file should be deleted at close time if there were no records
written (simpler but more overhead due to file creation and deletion in the
process).
Due to the complexity of the patch, I do not think it should be applied before
the 1.4 version. It should simply delete the file after closing it if there are
no writes to the file.
EDIT: As of 1.4 we now delete empty files on close() in the RecordWriter. I
would like to implement a more robust version which does not create a file
until the first write. I will do this for version 1.5 so as not to worry about
breaking things.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira