[jira] Commented: (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file

Todd Lipcon (JIRA) Fri, 12 Nov 2010 10:19:42 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931477#action_12931477
 ]


Todd Lipcon commented on HDFS-895:
----------------------------------

bq. I still have a question, if lastFlushOffset == bytesCurBlock, when will 
this condition to be true: oldCurrentPacket != null && currentPacket != null?

I don't think that will ever be true. We do get the case {{oldCurrentPacket == 
null && currentPacket == null}} though when we call flush twice at the 
beginning of any block. So I think we can add an assert {{assert 
oldCurrentPacket == null}} in that else clause.

bq. Please understand I did not mean to give you a hard time

No worries - I agree that this code is very tricky, which is why I'd like to 
keep the asserts at this point. The assert guards what we all thought was an 
invariant: sequence numbers should increase by exactly one with every packet. 
Nicolas also reviewed this code in depth a few months back, which is when we 
added this new {{currentSeqno--}} bit. If I recall correctly we discussed a lot 
whether there was any bug where we could skip or repeat a sequence number, and 
when we added the assert for in-order no-skipping sequence numbers, we found 
this bug.

Would it be better to open a very small JIRA to add the assert and fix for it, 
commit that, then commit this as an optimization? That would keep the two 
changes orthogonal and maybe easier to understand?

> Allow hflush/sync to occur in parallel with new writes to the file
> ------------------------------------------------------------------
>
>                 Key: HDFS-895
>                 URL: https://issues.apache.org/jira/browse/HDFS-895
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>
>         Attachments: 895-delta-for-review.txt, hdfs-895-0.20-append.txt, 
> hdfs-895-20.txt, hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, 
> hdfs-895.txt, hdfs-895.txt
>
>
> In the current trunk, the HDFS client methods writeChunk() and hflush./sync 
> are syncronized. This means that if a hflush/sync is in progress, an 
> applicationn cannot write data to the HDFS client buffer. This reduces the 
> write throughput of the transaction log in HBase. 
> The hflush/sync should allow new writes to happen to the HDFS client even 
> when a hflush/sync is in progress. It can record the seqno of the message for 
> which it should receice the ack, indicate to the DataStream thread to star 
> flushing those messages, exit the synchronized section  and just wai for that 
> ack to arrive.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file

Reply via email to