merge code is really slow
-------------------------
Key: HADOOP-874
URL: https://issues.apache.org/jira/browse/HADOOP-874
Project: Hadoop
Issue Type: Bug
Components: io
Affects Versions: 0.10.0
Reporter: Owen O'Malley
Assigned To: Devaraj Das
Fix For: 0.11.0
I had a case where the map output buffer size (io.sort.mb) was set too low and
caused a spill and merge. Fixing the configuration caused it to not spill until
it was finished. With the spill it took 9.5 minutes per a map. Without the
spill it took 45 seconds. Therefore, I assume it was taking ~9 minutes to do
the 2 file merge. That is really slow. The input files to the merge were two 25
mb sequence files (default codec (java), block compressed)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira