[
https://issues.apache.org/jira/browse/HDDS-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963835#comment-16963835
]
Istvan Fajth edited comment on HDDS-2376 at 10/31/19 10:08 AM:
---------------------------------------------------------------
Hi [~Sammi],
I have ran into a similar exception in one of our test environments, while I
was preparing some testing, this appeared after I have updated the Ozone and
Ratis jars on the cluster, and couldn't get to the bottom of it as there were
some other minor changes as well, and after rewriting the data everything
started to work properly, then I couldn't get to a reproduction so far.
Could the same happen on your side? Were there an update on Ozone after which
you started to see this?
was (Author: pifta):
Hi [~Sammi],
I have run into a similar exception in one of our test environments, while I
was preparing some testing, this appeared after I have updated the Ozone and
Ratis jars on the cluster, and couldn't get to the bottom of it as there were
some other minor changes as well, and after rewriting the data everything
started to work properly, then I couldn't get to a reproduction so far.
Could the same happen on your side? Were there an update on Ozone after which
you started to see this?
> Fail to read data through XceiverClientGrpc
> -------------------------------------------
>
> Key: HDDS-2376
> URL: https://issues.apache.org/jira/browse/HDDS-2376
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Sammi Chen
> Assignee: Hanisha Koneru
> Priority: Blocker
>
> Run teragen, application failed with following stack,
> 19/10/29 14:35:42 INFO mapreduce.Job: Running job: job_1567133159094_0048
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 running in
> uber mode : false
> 19/10/29 14:35:59 INFO mapreduce.Job: map 0% reduce 0%
> 19/10/29 14:35:59 INFO mapreduce.Job: Job job_1567133159094_0048 failed with
> state FAILED due to: Application application_1567133159094_0048 failed 2
> times due to AM Container for appattempt_1567133159094_0048_000002 exited
> with exitCode: -1000
> For more detailed output, check application tracking
> page:http://host183:8088/cluster/app/application_1567133159094_0048Then,
> click on links to logs of each attempt.
> Diagnostics: Unexpected OzoneException:
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at
> index 0
> java.io.IOException: Unexpected OzoneException:
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at
> index 0
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:342)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
> at
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
> at
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum
> mismatch at index 0
> at
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
> at
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
> at
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
> ... 26 more
> Caused by: Checksum mismatch at index 0
> org.apache.hadoop.ozone.common.OzoneChecksumException: Checksum mismatch at
> index 0
> at
> org.apache.hadoop.ozone.common.ChecksumData.verifyChecksumDataMatches(ChecksumData.java:148)
> at
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:275)
> at
> org.apache.hadoop.ozone.common.Checksum.verifyChecksum(Checksum.java:238)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.lambda$new$0(ChunkInputStream.java:375)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:287)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:250)
> at
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:233)
> at
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:245)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:335)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:307)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:259)
> at
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:144)
> at
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.read(BlockInputStream.java:239)
> at
> org.apache.hadoop.ozone.client.io.KeyInputStream.read(KeyInputStream.java:171)
> at
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:52)
> at java.io.DataInputStream.read(DataInputStream.java:100)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:86)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:60)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:120)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:267)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]