sodonnel opened a new pull request, #4180:
URL: https://github.com/apache/ozone/pull/4180
## What changes were proposed in this pull request?
When calculating a checksum for an EC file with Rack Topology enabled, you
can get the following error intermittently:
```
ERROR : Failed with exception null
java.lang.IndexOutOfBoundsException
at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
at
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
at
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
at
org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
at
org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
at
org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
...
ERROR : FAILED: Execution Error, return code 40000 from
org.apache.hadoop.hive.ql.exec.MoveTask. java.lang.IndexOutOfBoundsException
at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
at
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
at
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
at
org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
at
org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
at
org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
...
INFO : Completed executing
command(queryId=hive_20221214035652_bc45477d-98df-408e-b945-a63b4ac6896a); Time
taken: 22.167 seconds
INFO : OK
Error: Error while compiling statement: FAILED: Execution Error, return
code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask.
java.lang.IndexOutOfBoundsException
at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
at
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
at
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
at
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
at
org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
at
org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
at
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
at
org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
at
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
...
```
This is because the wrong nodes are used to obtain the stripe checksum
sometimes as the node does not correctly use the replicaIndex in the pipeline
to order the nodes.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7787
## How was this patch tested?
An existing test covers the checksum validate, so it confirms this change
has not broken anything. The actual problem is difficult to reproduce in a unit
test as the rack awareness is not easy to setup in such a way to affect the
node order in the pipeline. We do have a reproducible test with a Hive workload
that causes this, so we can validate the fix that way after this has been
committed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]