merge code is really slow
-------------------------

                 Key: HADOOP-874
                 URL: https://issues.apache.org/jira/browse/HADOOP-874
             Project: Hadoop
          Issue Type: Bug
          Components: io
    Affects Versions: 0.10.0
            Reporter: Owen O'Malley
         Assigned To: Devaraj Das
             Fix For: 0.11.0


I had a case where the map output buffer size (io.sort.mb) was set too low and 
caused a spill and merge. Fixing the configuration caused it to not spill until 
it was finished. With the spill it took 9.5 minutes per a map. Without the 
spill it took 45 seconds. Therefore, I assume it was taking ~9 minutes to do 
the 2 file merge. That is really slow. The input files to the merge were two 25 
mb sequence files (default codec (java), block compressed)


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to