[jira] [Updated] (HDFS-6214) Webhdfs has poor throughput for files >2GB

Daryn Sharp (JIRA) Wed, 09 Apr 2014 13:19:35 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daryn Sharp updated HDFS-6214:
------------------------------

    Attachment: HDFS-6214.patch

Jetty's chunked responses steal/reserve 12 bytes at the beginning of the 
buffer.  If you write the full buffer size, then 12 bytes spill over into 
another buffer which again has 12 reserved bytes.  The solution is to write & 
flush the buffer size minus 12.  The difference is dramatic: 10MB/s before vs 
80MB/s after which was probably hitting the network saturation point.

No test because it's rather difficult to write a performance test for big 
files.  We've been internally running with this change for months.

> Webhdfs has poor throughput for files >2GB
> ------------------------------------------
>
>                 Key: HDFS-6214
>                 URL: https://issues.apache.org/jira/browse/HDFS-6214
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-6214.patch
>
>
> For the DN's open call, jetty returns a Content-Length header for files <2GB, 
> and uses chunking for files >2GB.  A "bug" in jetty's buffer handling results 
> in a ~8X reduction in throughput.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6214) Webhdfs has poor throughput for files >2GB

Reply via email to