[
https://issues.apache.org/jira/browse/HDFS-6214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971512#comment-13971512
]
Daryn Sharp commented on HDFS-6214:
-----------------------------------
I think the question should be: is flush() necessary for _chunked_ transfers?
It doesn't flush today so arguably it may not be necessary. The non-chunked
use case is where the combination of buffer-12 byte writes and flushing is
necessary.
Chunked performance is the baseline for performance. I double checked with our
performance people and the change did not affect chunked performance while it
has significantly boosted non-chunked performance. I chose to always flush for
simplicity and reduced complexity.
For other watchers, we have been running in production with this change for
almost a month.
> Webhdfs has poor throughput for files >2GB
> ------------------------------------------
>
> Key: HDFS-6214
> URL: https://issues.apache.org/jira/browse/HDFS-6214
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: webhdfs
> Affects Versions: 2.0.0-alpha, 3.0.0
> Reporter: Daryn Sharp
> Assignee: Daryn Sharp
> Attachments: HDFS-6214.patch
>
>
> For the DN's open call, jetty returns a Content-Length header for files <2GB,
> and uses chunking for files >2GB. A "bug" in jetty's buffer handling results
> in a ~8X reduction in throughput.
--
This message was sent by Atlassian JIRA
(v6.2#6252)