[
https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931477#action_12931477
]
Todd Lipcon commented on HDFS-895:
----------------------------------
bq. I still have a question, if lastFlushOffset == bytesCurBlock, when will
this condition to be true: oldCurrentPacket != null && currentPacket != null?
I don't think that will ever be true. We do get the case {{oldCurrentPacket ==
null && currentPacket == null}} though when we call flush twice at the
beginning of any block. So I think we can add an assert {{assert
oldCurrentPacket == null}} in that else clause.
bq. Please understand I did not mean to give you a hard time
No worries - I agree that this code is very tricky, which is why I'd like to
keep the asserts at this point. The assert guards what we all thought was an
invariant: sequence numbers should increase by exactly one with every packet.
Nicolas also reviewed this code in depth a few months back, which is when we
added this new {{currentSeqno--}} bit. If I recall correctly we discussed a lot
whether there was any bug where we could skip or repeat a sequence number, and
when we added the assert for in-order no-skipping sequence numbers, we found
this bug.
Would it be better to open a very small JIRA to add the assert and fix for it,
commit that, then commit this as an optimization? That would keep the two
changes orthogonal and maybe easier to understand?
> Allow hflush/sync to occur in parallel with new writes to the file
> ------------------------------------------------------------------
>
> Key: HDFS-895
> URL: https://issues.apache.org/jira/browse/HDFS-895
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs client
> Affects Versions: 0.22.0
> Reporter: dhruba borthakur
> Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: 895-delta-for-review.txt, hdfs-895-0.20-append.txt,
> hdfs-895-20.txt, hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt,
> hdfs-895.txt, hdfs-895.txt
>
>
> In the current trunk, the HDFS client methods writeChunk() and hflush./sync
> are syncronized. This means that if a hflush/sync is in progress, an
> applicationn cannot write data to the HDFS client buffer. This reduces the
> write throughput of the transaction log in HBase.
> The hflush/sync should allow new writes to happen to the HDFS client even
> when a hflush/sync is in progress. It can record the seqno of the message for
> which it should receice the ack, indicate to the DataStream thread to star
> flushing those messages, exit the synchronized section and just wai for that
> ack to arrive.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.