Pratyush Bhatt created HDDS-11012:
-------------------------------------

             Summary: [Hbase-Ozone] HMaster down with NO_REPLICA_FOUND causing 
"CorruptHFileException: Problem reading HFile Trailer"
                 Key: HDDS-11012
                 URL: https://issues.apache.org/jira/browse/HDDS-11012
             Project: Apache Ozone
          Issue Type: Bug
          Components: SCM
            Reporter: Pratyush Bhatt


Both the HMasters are abruptly down with {_}IllegalArgumentException: 
NO_REPLICA_FOUND{_}.
causing _"CorruptHFileException: Problem reading HFile Trailer from file"_

*Stack Trace:*
{code:java}
2024-06-13 02:57:51,744 ERROR org.apache.hadoop.hbase.master.HMaster: Failed to 
become active master
java.io.IOException: java.io.IOException: 
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
Trailer from file 
ofs://ozone1717496222/volhbase-new07062024/buckethbase-1717572506/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc/91207977e6d74ba2ba6a564570832563
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1144)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1087)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:990)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:940)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7904)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7861)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:307)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:424)
        at 
org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:122)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2216)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: 
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
Trailer from file 
ofs://ozone1717496222/volhbase-new07062024/buckethbase-1717572506/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc/91207977e6d74ba2ba6a564570832563
        at 
org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:284)
        at 
org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:334)
        at org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:306)
        at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6365)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1110)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1107)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more
Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem 
reading HFile Trailer from file 
ofs://ozone1717496222/volhbase-new07062024/buckethbase-1717572506/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/proc/91207977e6d74ba2ba6a564570832563
        at 
org.apache.hadoop.hbase.io.hfile.HFileInfo.initTrailerAndContext(HFileInfo.java:349)
        at org.apache.hadoop.hbase.io.hfile.HFileInfo.<init>(HFileInfo.java:123)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileInfo.initHFileInfo(StoreFileInfo.java:706)
        at 
org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:364)
        at 
org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:485)
        at 
org.apache.hadoop.hbase.regionserver.StoreEngine.createStoreFileAndReader(StoreEngine.java:224)
        at 
org.apache.hadoop.hbase.regionserver.StoreEngine.lambda$openStoreFiles$0(StoreEngine.java:262)
        ... 6 more
Caused by: java.lang.IllegalArgumentException: NO_REPLICA_FOUND
        at 
org.apache.hadoop.ozone.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
        at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:180)
        at 
org.apache.hadoop.hdds.scm.XceiverClientManager.acquireClientForReadData(XceiverClientManager.java:161)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.acquireClient(BlockInputStream.java:342)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.getBlockData(BlockInputStream.java:258)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.initialize(BlockInputStream.java:164)
        at 
org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:370)
        at 
org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
        at 
org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
        at 
org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:96)
        at 
org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
        at 
org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:81)
        at java.io.DataInputStream.readFully(DataInputStream.java:195)
        at 
org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:394)
        at 
org.apache.hadoop.hbase.io.hfile.HFileInfo.initTrailerAndContext(HFileInfo.java:339)
        ... 12 more
2024-06-13 02:57:51,745 ERROR org.apache.hadoop.hbase.master.HMaster: ***** 
ABORTING master vc0121.xyz.com,22001,1718272586518: Unhandled exception. 
Starting shutdown. ***** {code}
cc: [~ashishk] [~Sammi] [~weichiu] 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to