[ 
https://issues.apache.org/jira/browse/HDFS-15732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260256#comment-17260256
 ] 

gaozhan ding commented on HDFS-15732:
-------------------------------------

[~weichiu] The case here is that in function 
DFSStripedInputStream::createBlockReader, when invoke getBlockReader, it will 
throw an IOException instead of invalid block token exception when do SASL 
negotiation  if the token is expired. So it will not refetch token and do a 
retry   .PR2558 is available and can avoid this scene.

> EC client will not retry get block token when block token expired  in 
> kerberized cluster
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-15732
>                 URL: https://issues.apache.org/jira/browse/HDFS-15732
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: dfsclient, ec, erasure-coding
>    Affects Versions: 3.1.1
>         Environment: hadoop 3.1.1
> kerberos
> ec RS-3-2-1024k
>            Reporter: gaozhan ding
>            Assignee: gaozhan ding
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When enable ec policy on hbase, we got some issues. Once block token was 
> expired in datanode side, client side will not identify the InvalidToken 
> error because of the SASL negotiation. As a result, ec client will not do 
> retry by refetch token when create blockreader.  Then the peer datanode was 
> added to DeadNodes, and all calls to function createBlockReader aim at this 
> datanode in current DFSStripedInputStream will consider this datanode  was 
> dead and return false. The finally result is a read failure.
> Some logs :
> hbase regionserver:
> 2020-12-17 10:00:24,291 WARN 
> [RpcServer.default.FPBQ.Fifo.handler=15,queue=0,port=16020] hdfs.DFSClient: 
> Failed to connect to /10.65.19.41:9866 for 
> blockBP-1601568648-10.65.19.12-1550823043026:blk_-9223372036813273566_672859566
> java.io.IOException: DIGEST-MD5: IO error acquiring password
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:421)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:479)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:393)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:267)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:215)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
>  at 
> org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:647)
>  at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2936)
>  at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821)
>  at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746)
>  at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
>  at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:647)
>  at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:272)
>  at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:333)
>  at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:365)
>  at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:514)
>  at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1354)
>  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1318)
>  at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1568)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:340)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:856)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:806)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:327)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:228)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:395)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:250)
>  at 
> org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2031)
>  at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2022)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6408)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6388)
>  at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2926)
>  
>  
> datanode log:
> 2020-12-17 10:00:24,290 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> hadoop-btdn0005.eniot.io:9866:DataXceiver error processing unknown operation 
> src: /10.65.19.42:53894 dst: /10.65.19.41:9866
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password 
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Block 
> token with block_token_identifier (expiryDate=1608192045256, 
> keyId=1449537289, userId=hbase, 
> blockPoolId=BP-1601568648-10.65.19.12-1550823043026, 
> blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK, 
> DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd, 
> DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a, 
> DS-a1ee0117-7430-4181-b279-3f1b16541567, 
> DS-e8459904-e1e1-44a6-affa-9ce9fc56f877, 
> DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired.]
>  at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
>  at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Block 
> token with block_token_identifier (expiryDate=1608192045256, 
> keyId=1449537289, userId=hbase, 
> blockPoolId=BP-1601568648-10.65.19.12-1550823043026, 
> blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK, 
> DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd, 
> DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a, 
> DS-a1ee0117-7430-4181-b279-3f1b16541567, 
> DS-e8459904-e1e1-44a6-affa-9ce9fc56f877, 
> DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired.
>  at 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:479)
>  at 
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:81)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241)
>  at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>  ... 7 more
>  
> final result
>  
>  
> java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for 
> reader 
> reader=hdfs://azbeta/hbase/data/o15427722038191/RAW_569dfc35-10dc-4fe4-bb8c-5c403368d6c1/ccf146fb5c852d751b6b2f8307f81fa0/d/386bdb6978d348f4b4e168e728762842,
>  compression=snappy, cacheConf=blockCache=LruBlockCache\{blockCount=7915, 
> currentSize=489.13 MB, freeSize=2.71 GB, maxSize=3.19 GB, heapSize=489.13 MB, 
> minSize=3.03 GB, minFactor=0.95, multiSize=1.52 GB, multiFactor=0.5, 
> singleSize=775.81 MB, singleFactor=0.25}, cacheDataOnRead=true, 
> cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, 
> cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, 
> firstKey=Optional[\x01\x00\x00\x13\xBD\x00\x00\x00T\x070\x97\xF7/d:arm_xingneng_test_model@AI1/1603445985858/Put/seqid=0],
>  
> lastKey=Optional[\x01\x00\x00\x16\xF9\x00\x00\x00T\x073-8/d:arm_xingneng_test_model@AI99/1603446144699/Put/seqid=0],
>  avgKeyLen=53, avgValueLen=4, entries=2016557, length=22957603, 
> cur=\x01\x00\x00\x14#\x00\x00\x00T\x072\xCC\xF7/d:arm_xingneng_test_model@AI84/1603446131146/Put/vlen=4/seqid=17230816]
>  at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:217)
>  at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:388)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:327)
>  at 
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
>  at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
>  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1410)
>  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2187)
>  at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:596)
>  at 
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:638)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: 3 missing blocks, the stripe is: 
> AlignedStripe(Offset=1048576, length=1048576, fetchedChunksNum=0, 
> missingChunksNum=3); locatedBlocks is: LocatedBlocks\{; fileLength=22957603; 
> underConstruction=false; 
> blocks=[LocatedStripedBlock{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750;
>  getBlockSize()=22957603; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK],
>  
> DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK],
>  
> DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK],
>  
> DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK],
>  
> DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]];
>  indices=[0, 1, 2, 3, 4]}]; 
> lastLocatedBlock=LocatedStripedBlock\{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750;
>  getBlockSize()=22957603; corrupt=false; offset=0; 
> locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK],
>  
> DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK],
>  
> DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK],
>  
> DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK],
>  
> DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]];
>  indices=[0, 1, 2, 3, 4]}; isLastBlockComplete=true; 
> ecPolicy=ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, 
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2]}
>  at 
> org.apache.hadoop.hdfs.StripeReader.checkMissingBlocks(StripeReader.java:177)
>  at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:372)
>  at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:324)
>  at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:397)
>  at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820)
>  at java.io.DataInputStream.read(DataInputStream.java:149)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:765)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1562)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:931)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1064)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1058)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1076)
>  at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
>  at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
>  ... 13 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to