sodonnel opened a new pull request, #4180:
URL: https://github.com/apache/ozone/pull/4180

   ## What changes were proposed in this pull request?
   
   When calculating a checksum for an EC file with Rack Topology enabled, you 
can get the following error intermittently:
   
   ```
   ERROR : Failed with exception null
     java.lang.IndexOutOfBoundsException
           at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
           at 
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
           at 
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
           at 
org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
           at 
org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
           at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
           at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
           at 
org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
           at 
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
           at 
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
   ...
   ERROR : FAILED: Execution Error, return code 40000 from 
org.apache.hadoop.hive.ql.exec.MoveTask. java.lang.IndexOutOfBoundsException
           at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
           at 
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
           at 
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
           at 
org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
           at 
org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
           at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
           at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
           at 
org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
           at 
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
           at 
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
   ...
   INFO  : Completed executing 
command(queryId=hive_20221214035652_bc45477d-98df-408e-b945-a63b4ac6896a); Time 
taken: 22.167 seconds
     INFO  : OK
     Error: Error while compiling statement: FAILED: Execution Error, return 
code 40000 from org.apache.hadoop.hive.ql.exec.MoveTask. 
java.lang.IndexOutOfBoundsException
           at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
           at 
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.computeCompositeCrc(ECBlockChecksumComputer.java:163)
           at 
org.apache.hadoop.ozone.client.checksum.ECBlockChecksumComputer.compute(ECBlockChecksumComputer.java:65)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.getBlockChecksumFromChunkChecksums(ECFileChecksumHelper.java:148)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlock(ECFileChecksumHelper.java:106)
           at 
org.apache.hadoop.ozone.client.checksum.ECFileChecksumHelper.checksumBlocks(ECFileChecksumHelper.java:73)
           at 
org.apache.hadoop.ozone.client.checksum.BaseFileChecksumHelper.compute(BaseFileChecksumHelper.java:220)
           at 
org.apache.hadoop.fs.ozone.OzoneClientUtils.getFileChecksumWithCombineMode(OzoneClientUtils.java:223)
           at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneClientAdapterImpl.getFileChecksum(BasicRootedOzoneClientAdapterImpl.java:1123)
           at 
org.apache.hadoop.fs.ozone.BasicRootedOzoneFileSystem.getFileChecksum(BasicRootedOzoneFileSystem.java:955)
           at 
org.apache.hadoop.fs.FileSystem.getFileChecksum(FileSystem.java:2831)
           at 
org.apache.hadoop.hive.ql.metadata.Hive.addInsertNonDirectoryInformation(Hive.java:3659)
           at 
org.apache.hadoop.hive.ql.metadata.Hive.addInsertFileInformation(Hive.java:3632)
           ...
   ```
   
   This is because the wrong nodes are used to obtain the stripe checksum 
sometimes as the node does not correctly use the replicaIndex in the pipeline 
to order the nodes.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-7787
   
   ## How was this patch tested?
   
   An existing test covers the checksum validate, so it confirms this change 
has not broken anything. The actual problem is difficult to reproduce in a unit 
test as the rack awareness is not easy to setup in such a way to affect the 
node order in the pipeline. We do have a reproducible test with a Hive workload 
that causes this, so we can validate the fix that way after this has been 
committed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to