[ 
https://issues.apache.org/jira/browse/HDDS-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854630#comment-17854630
 ] 

Ashish Kumar commented on HDDS-10632:
-------------------------------------

Able to reproduce issue in docker
Below config changed to reproduce faster:
 
{code:java}
ozone.client.stream.putblock.piggybacking : true 
ozone.client.incremental.chunk.list : true 
ozone.scm.block.size: 10B 
stream.buffer.size: 2B 
stream.buffer.flush.size: 4B 
stream.buffer.flush.delay: false{code}
 
 
*Steps:* * Open file outputstream
 * Write 8 bytes data and do hsync (At this step openFileTable has 
datasize:10(default) for block1 and fileTable has datasize:8)
 * Close container(for above data) using admin command.
 * Write another 8 bytes data and do flush (At this step openFileTable has 
still datasize:10, fileTable has datasize:8 for block1 and block2 exist only in 
openFileTable with datasize:10)

Since container is CLOSED even though 2bytes were available in block1, write 
fails on 1st block as expected. *  Do recover lease, in this case it recovers 
last block and uses openFileTable data for previous blocks.

Now final fileTable contains, block1 with datasize:10 and block2 with datasize: 
8.
Since openFileTable data length is wrong for 1st block, it updates wrongly in 
fileTable.

While reading key it throws “Inconsistent read”
{code:java}
bash-4.2$ ozone sh key get /vol1/bucket1/key1 file1 Inconsistent read for 
blockID=conID: 1 locID: 113750153625600001 bcsId: 0 length=10 position=8 
numBytesToRead=10 numBytesRead=8{code}
 
I have intentionally closed container here to reproduce but there could many 
other reason which can lead to “data block” is not completely filled and new 
block is used for WRITE.

> [Hbase Ozone] HMaster aborted with "IOException: Inconsistent read"
> -------------------------------------------------------------------
>
>                 Key: HDDS-10632
>                 URL: https://issues.apache.org/jira/browse/HDDS-10632
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Pratyush Bhatt
>            Assignee: Ashish Kumar
>            Priority: Major
>              Labels: pull-request-available
>
> Both the HMasters are down, the HMaster fails with:
> {code:java}
> 2024-04-01 13:15:51,517 ERROR org.apache.hadoop.hbase.master.HMaster: Failed 
> to become active master
> java.io.IOException: Inconsistent read for blockID=conID: 8366 locID: 
> 113750153625964072 bcsId: 0 length=268435456 position=83 numBytesToRead=1 
> numBytesRead=-1
>         at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.checkPartBytesRead(KeyInputStream.java:191)
>         at 
> org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:97)
>         at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
>         at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:43)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:55)
>         at java.io.FilterInputStream.read(FilterInputStream.java:83)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.parseDelimitedFrom(ProtobufUtil.java:3576)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:348)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:95)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:83)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:5298)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5182)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:998)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:939)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7903)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7860)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:307)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:424)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:122)
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2216)
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
>         at java.lang.Thread.run(Thread.java:748)
> 2024-04-01 13:15:51,517 ERROR org.apache.hadoop.hbase.master.HMaster: ***** 
> ABORTING master vc0121.halxg.cloudera.com,22001,1711989581483: Unhandled 
> exception. Starting shutdown. *****
> java.io.IOException: Inconsistent read for blockID=conID: 8366 locID: 
> 113750153625964072 bcsId: 0 length=268435456 position=83 numBytesToRead=1 
> numBytesRead=-1
>         at 
> org.apache.hadoop.ozone.client.io.KeyInputStream.checkPartBytesRead(KeyInputStream.java:191)
>         at 
> org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:97)
>         at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
>         at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:43)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:55)
>         at java.io.FilterInputStream.read(FilterInputStream.java:83)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.parseDelimitedFrom(ProtobufUtil.java:3576)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:348)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:95)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:83)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:5298)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5182)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:998)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:939)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7903)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7860)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:307)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:424)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:122)
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2216)
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
>         at java.lang.Thread.run(Thread.java:748)
> 2024-04-01 13:15:51,517 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ***** STOPPING region 
> server 'vc0121.xyz,22001,1711989581483' ***** {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to