[
https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Takanobu Asanuma resolved HDFS-16544.
-------------------------------------
Fix Version/s: 3.4.0
3.2.4
3.3.4
Assignee: qinyuren
Resolution: Fixed
> EC decoding failed due to invalid buffer
> ----------------------------------------
>
> Key: HDFS-16544
> URL: https://issues.apache.org/jira/browse/HDFS-16544
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: erasure-coding
> Reporter: qinyuren
> Assignee: qinyuren
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we
> found an EC file decoding bug if more than one data block read failed.
> Currently, we found another bug trigger by #StatefulStripeReader.decode.
> If we read an EC file which {*}length more than one stripe{*}, and this file
> have *one data block* and *the first parity block* corrupted, this error will
> happen.
> {code:java}
> org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not
> allowing null at
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
> at
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
> at
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
> at
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
> at
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
> at
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
> at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918)
> {code}
>
> Let's say we use ec(6+3) and the data block[0] and the first parity block[6]
> are corrupted.
> # The readers for block[0] and block[6] will be closed after reading the
> first stripe of an EC file;
> # When the client reading the second stripe of the EC file, it will trigger
> #prepareParityChunk for block[6].
> # The decodeInputs[6] will not be constructed because the reader for
> block[6] was closed.
>
> {code:java}
> boolean prepareParityChunk(int index) {
> Preconditions.checkState(index >= dataBlkNum
> && alignedStripe.chunks[index] == null);
> if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
> alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
> // we have failed the block reader before
> return false;
> }
> final int parityIndex = index - dataBlkNum;
> ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
> buf.position(cellSize * parityIndex);
> buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
> decodeInputs[index] =
> new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
> alignedStripe.chunks[index] =
> new StripingChunk(decodeInputs[index].getBuffer());
> return true;
> } {code}
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]