[jira] Commented: (HADOOP-4396) sort on 400 nodes is now slower than in 18

Jothi Padmanabhan (JIRA) Thu, 16 Oct 2008 06:29:36 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640158#action_12640158
 ]


Jothi Padmanabhan commented on HADOOP-4396:
-------------------------------------------

One thing that was different between the LocalFileSystem (LFS) and the 
RawLocalFileSystem (RFS) is that all the reads and writes from LFS, when they 
reach the  RFS layer, are in chunks of 512 bytes. I tried to mimic this 
behavior at the IFileInputStream and IFileOutputStream by reading and writing 
in 1k chunks and magically the performance degradation disappeared. What I did 
was something like
 
{code}

write (b, off, len) {
  bytesToWrite = 0;
  bytesWritten = 0;
  while (bytesWritten < len) {
    bytesToWrite = Math.min(len-bytesWritten,1024);
    out.write (b, off+bytesWritten, bytesToWrite);
    bytesWritten += bytesToWrite;
  }
}

{code}

Similarly for the read as well. Any thoughts on why this should work?

> sort on 400 nodes is now slower than in 18
> ------------------------------------------
>
>                 Key: HADOOP-4396
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4396
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Jothi Padmanabhan
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> Sort on 400 nodes on  hadoop release 18 takes about 29 minutes, but with the 
> 19 branch takes about 32 minutes. This behavior is consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4396) sort on 400 nodes is now slower than in 18

Reply via email to