[
https://issues.apache.org/jira/browse/HDDS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854630#comment-17854630
]
Ashish Kumar commented on HDDS-10632:
-------------------------------------
Able to reproduce issue in docker
Below config changed to reproduce faster:
{code:java}
ozone.client.stream.putblock.piggybacking : true
ozone.client.incremental.chunk.list : true
ozone.scm.block.size: 10B
stream.buffer.size: 2B
stream.buffer.flush.size: 4B
stream.buffer.flush.delay: false{code}
*Steps:* * Open file outputstream
* Write 8 bytes data and do hsync (At this step openFileTable has
datasize:10(default) for block1 and fileTable has datasize:8)
* Close container(for above data) using admin command.
* Write another 8 bytes data and do flush (At this step openFileTable has
still datasize:10, fileTable has datasize:8 for block1 and block2 exist only in
openFileTable with datasize:10)
Since container is CLOSED even though 2bytes were available in block1, write
fails on 1st block as expected. * Do recover lease, in this case it recovers
last block and uses openFileTable data for previous blocks.
Now final fileTable contains, block1 with datasize:10 and block2 with datasize:
8.
Since openFileTable data length is wrong for 1st block, it updates wrongly in
fileTable.
While reading key it throws “Inconsistent read”
{code:java}
bash-4.2$ ozone sh key get /vol1/bucket1/key1 file1 Inconsistent read for
blockID=conID: 1 locID: 113750153625600001 bcsId: 0 length=10 position=8
numBytesToRead=10 numBytesRead=8{code}
I have intentionally closed container here to reproduce but there could many
other reason which can lead to “data block” is not completely filled and new
block is used for WRITE.
> [Hbase Ozone] HMaster aborted with "IOException: Inconsistent read"
> -------------------------------------------------------------------
>
> Key: HDDS-10632
> URL: https://issues.apache.org/jira/browse/HDDS-10632
> Project: Apache Ozone
> Issue Type: Bug
> Components: SCM
> Reporter: Pratyush Bhatt
> Assignee: Ashish Kumar
> Priority: Major
> Labels: pull-request-available
>
> Both the HMasters are down, the HMaster fails with:
> {code:java}
> 2024-04-01 13:15:51,517 ERROR org.apache.hadoop.hbase.master.HMaster: Failed
> to become active master
> java.io.IOException: Inconsistent read for blockID=conID: 8366 locID:
> 113750153625964072 bcsId: 0 length=268435456 position=83 numBytesToRead=1
> numBytesRead=-1
> at
> org.apache.hadoop.ozone.client.io.KeyInputStream.checkPartBytesRead(KeyInputStream.java:191)
> at
> org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:97)
> at
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
> at
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:43)
> at
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:55)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.parseDelimitedFrom(ProtobufUtil.java:3576)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:348)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:95)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:83)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:5298)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5182)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:998)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:939)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7903)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7860)
> at
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:307)
> at
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:424)
> at
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:122)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2216)
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
> at java.lang.Thread.run(Thread.java:748)
> 2024-04-01 13:15:51,517 ERROR org.apache.hadoop.hbase.master.HMaster: *****
> ABORTING master vc0121.halxg.cloudera.com,22001,1711989581483: Unhandled
> exception. Starting shutdown. *****
> java.io.IOException: Inconsistent read for blockID=conID: 8366 locID:
> 113750153625964072 bcsId: 0 length=268435456 position=83 numBytesToRead=1
> numBytesRead=-1
> at
> org.apache.hadoop.ozone.client.io.KeyInputStream.checkPartBytesRead(KeyInputStream.java:191)
> at
> org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:97)
> at
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
> at
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:43)
> at
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:55)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.parseDelimitedFrom(ProtobufUtil.java:3576)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:348)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:95)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:83)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:5298)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5182)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:998)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:939)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7903)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7860)
> at
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:307)
> at
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:424)
> at
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:122)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2216)
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
> at java.lang.Thread.run(Thread.java:748)
> 2024-04-01 13:15:51,517 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: ***** STOPPING region
> server 'vc0121.xyz,22001,1711989581483' ***** {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]