[
https://issues.apache.org/jira/browse/HDFS-15732?focusedWorklogId=525537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525537
]
ASF GitHub Bot logged work on HDFS-15732:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 17/Dec/20 12:57
Start Date: 17/Dec/20 12:57
Worklog Time Spent: 10m
Work Description: dgzdot opened a new pull request #2557:
URL: https://github.com/apache/hadoop/pull/2557
Since ec client could not identify the InvalidToken error because of the
SASL negotiation, always reftch token when encounter error for the first time.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 525537)
Remaining Estimate: 0h
Time Spent: 10m
> EC client will not retry get block token when block token expired in
> kerberized cluster
> ----------------------------------------------------------------------------------------
>
> Key: HDFS-15732
> URL: https://issues.apache.org/jira/browse/HDFS-15732
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: dfsclient, ec, erasure-coding
> Affects Versions: 3.1.1
> Environment: hadoop 3.1.1
> kerberos
> ec RS-3-2-1024k
> Reporter: gaozhan ding
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When enable ec policy on hbase, we got some issues. Once block token was
> expired in datanode side, client side will not identify the InvalidToken
> error because of the SASL negotiation. As a result, ec client will not do
> retry by refetch token when create blockreader. Then the peer datanode was
> added to DeadNodes, and all calls to function createBlockReader aim at this
> datanode in current DFSStripedInputStream will consider this datanode was
> dead and return false. The finally result is a read failure.
> Some logs :
> hbase regionserver:
> 2020-12-17 10:00:24,291 WARN
> [RpcServer.default.FPBQ.Fifo.handler=15,queue=0,port=16020] hdfs.DFSClient:
> Failed to connect to /10.65.19.41:9866 for
> blockBP-1601568648-10.65.19.12-1550823043026:blk_-9223372036813273566_672859566
> java.io.IOException: DIGEST-MD5: IO error acquiring password
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:421)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:479)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:393)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:267)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:215)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160)
> at
> org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:647)
> at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2936)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746)
> at
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
> at
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:647)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:272)
> at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:333)
> at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:365)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:514)
> at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1354)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1318)
> at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1568)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:340)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:856)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:806)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:327)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:228)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:395)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:250)
> at
> org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2031)
> at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2022)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6408)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6388)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2926)
>
>
> datanode log:
> 2020-12-17 10:00:24,290 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> hadoop-btdn0005.eniot.io:9866:DataXceiver error processing unknown operation
> src: /10.65.19.42:53894 dst: /10.65.19.41:9866
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password
> [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Block
> token with block_token_identifier (expiryDate=1608192045256,
> keyId=1449537289, userId=hbase,
> blockPoolId=BP-1601568648-10.65.19.12-1550823043026,
> blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK,
> DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd,
> DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a,
> DS-a1ee0117-7430-4181-b279-3f1b16541567,
> DS-e8459904-e1e1-44a6-affa-9ce9fc56f877,
> DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired.]
> at
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598)
> at
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Block
> token with block_token_identifier (expiryDate=1608192045256,
> keyId=1449537289, userId=hbase,
> blockPoolId=BP-1601568648-10.65.19.12-1550823043026,
> blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK,
> DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd,
> DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a,
> DS-a1ee0117-7430-4181-b279-3f1b16541567,
> DS-e8459904-e1e1-44a6-affa-9ce9fc56f877,
> DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired.
> at
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:479)
> at
> org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:81)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241)
> at
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
> ... 7 more
>
> final result
>
>
> java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for
> reader
> reader=hdfs://azbeta/hbase/data/o15427722038191/RAW_569dfc35-10dc-4fe4-bb8c-5c403368d6c1/ccf146fb5c852d751b6b2f8307f81fa0/d/386bdb6978d348f4b4e168e728762842,
> compression=snappy, cacheConf=blockCache=LruBlockCache\{blockCount=7915,
> currentSize=489.13 MB, freeSize=2.71 GB, maxSize=3.19 GB, heapSize=489.13 MB,
> minSize=3.03 GB, minFactor=0.95, multiSize=1.52 GB, multiFactor=0.5,
> singleSize=775.81 MB, singleFactor=0.25}, cacheDataOnRead=true,
> cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false,
> cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false,
> firstKey=Optional[\x01\x00\x00\x13\xBD\x00\x00\x00T\x070\x97\xF7/d:arm_xingneng_test_model@AI1/1603445985858/Put/seqid=0],
>
> lastKey=Optional[\x01\x00\x00\x16\xF9\x00\x00\x00T\x073-8/d:arm_xingneng_test_model@AI99/1603446144699/Put/seqid=0],
> avgKeyLen=53, avgValueLen=4, entries=2016557, length=22957603,
> cur=\x01\x00\x00\x14#\x00\x00\x00T\x072\xCC\xF7/d:arm_xingneng_test_model@AI84/1603446131146/Put/vlen=4/seqid=17230816]
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:217)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:388)
> at
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:327)
> at
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
> at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1410)
> at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2187)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:596)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:638)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: 3 missing blocks, the stripe is:
> AlignedStripe(Offset=1048576, length=1048576, fetchedChunksNum=0,
> missingChunksNum=3); locatedBlocks is: LocatedBlocks\{; fileLength=22957603;
> underConstruction=false;
> blocks=[LocatedStripedBlock{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750;
> getBlockSize()=22957603; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK],
>
> DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK],
>
> DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK],
>
> DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK],
>
> DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]];
> indices=[0, 1, 2, 3, 4]}];
> lastLocatedBlock=LocatedStripedBlock\{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750;
> getBlockSize()=22957603; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK],
>
> DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK],
>
> DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK],
>
> DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK],
>
> DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]];
> indices=[0, 1, 2, 3, 4]}; isLastBlockComplete=true;
> ecPolicy=ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs,
> numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2]}
> at
> org.apache.hadoop.hdfs.StripeReader.checkMissingBlocks(StripeReader.java:177)
> at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:372)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:324)
> at
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:397)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:765)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1562)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:931)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1064)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1058)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1076)
> at
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208)
> ... 13 more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]