[
https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799856#action_12799856
]
Joydeep Sen Sarma commented on HDFS-895:
----------------------------------------
i think it's worth verifing that this would actually help hbase throughput
(just a theory right now i think).
we could set the hbase queue threshold to 1 and test with fake sync (that just
returns immediately) and real sync and see what the difference is (is the sync
time really holding back overall throughput (as intuition says it should be)).
also - the proposal is to overlap actual network traffic and not just the
buffer copies across app/dfs - right?
> Allow hflush/sync to occur in parallel with new writes to the file
> ------------------------------------------------------------------
>
> Key: HDFS-895
> URL: https://issues.apache.org/jira/browse/HDFS-895
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs client
> Reporter: dhruba borthakur
>
> In the current trunk, the HDFS client methods writeChunk() and hflush./sync
> are syncronized. This means that if a hflush/sync is in progress, an
> applicationn cannot write data to the HDFS client buffer. This reduces the
> write throughput of the transaction log in HBase.
> The hflush/sync should allow new writes to happen to the HDFS client even
> when a hflush/sync is in progress. It can record the seqno of the message for
> which it should receice the ack, indicate to the DataStream thread to star
> flushing those messages, exit the synchronized section and just wai for that
> ack to arrive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.