[ 
https://issues.apache.org/jira/browse/HDFS-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041804#comment-16041804
 ] 

Andrew Wang commented on HDFS-11882:
------------------------------------

Thanks for working on this Akira. I looked at this part of the code again and 
there are things that could be improved, and that I don't understand.

{code}
  /**
   * Get the number of acked stripes. An acked stripe means at least data block
   * number size cells of the stripe were acked.
   */
  private long getNumAckedStripes() {
{code}

Although it says that "at least data block number size cells", the method 
doesn't check this. I think this is okay though since the callers validate that 
there are enough healthy streamers, but the javadoc minimally should be 
updated, or additional checks added.

[Walter's intent though was to only count full 
stripes|https://issues.apache.org/jira/browse/HDFS-9342?focusedCommentId=15072472&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15072472],
 but I don't understand why this is correct. When closing a file at a 
non-stripe boundary, the last stripe is necessarily not a full stripe. In this 
situation, shouldn't the length be (as getNumAckedStripes alludes) the stripe 
that has minimally numDataBlocks cells?

What's the effect of truncating the file length to the last full stripe? Does 
this truncate the file? When updatePipeline is called while closing the file, I 
don't see any logic to rewrite that last partial stripe.

I'm hoping some of these questions can be investigated via unit test.

Ping [~walter.k.su] / [~zhz] for inputs.

> Client fails if acknowledged size is greater than bytes sent
> ------------------------------------------------------------
>
>                 Key: HDFS-11882
>                 URL: https://issues.apache.org/jira/browse/HDFS-11882
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding, test
>            Reporter: Akira Ajisaka
>            Assignee: Akira Ajisaka
>         Attachments: HDFS-11882.01.patch
>
>
> Some tests of erasure coding fails by the following exception. The following 
> test was removed by HDFS-11823, however, this type of error can happen in 
> real cluster.
> {noformat}
> Running 
> org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
> Tests run: 14, Failures: 0, Errors: 1, Skipped: 10, Time elapsed: 89.086 sec 
> <<< FAILURE! - in 
> org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure
> testMultipleDatanodeFailure56(org.apache.hadoop.hdfs.TestDFSRSDefault10x4StripedOutputStreamWithFailure)
>   Time elapsed: 38.831 sec  <<< ERROR!
> java.lang.IllegalStateException: null
>       at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>       at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.updatePipeline(DFSStripedOutputStream.java:780)
>       at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:664)
>       at 
> org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:1034)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:842)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
>       at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:472)
>       at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTestWithMultipleFailure(TestDFSStripedOutputStreamWithFailure.java:381)
>       at 
> org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testMultipleDatanodeFailure56(TestDFSStripedOutputStreamWithFailure.java:245)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to