[ https://issues.apache.org/jira/browse/HDFS-15732?focusedWorklogId=525628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-525628 ]
ASF GitHub Bot logged work on HDFS-15732: ----------------------------------------- Author: ASF GitHub Bot Created on: 17/Dec/20 16:10 Start Date: 17/Dec/20 16:10 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2557: URL: https://github.com/apache/hadoop/pull/2557#issuecomment-747538572 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 32s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 53s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 42s | | trunk passed | | +1 :green_heart: | compile | 19m 58s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | compile | 17m 19s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | checkstyle | 2m 41s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 39s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 10s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 56s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 2m 33s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +0 :ok: | spotbugs | 2m 51s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 5m 26s | | trunk passed | |||| _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 51s | | the patch passed | | +1 :green_heart: | compile | 20m 11s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javac | 20m 11s | | the patch passed | | +1 :green_heart: | compile | 17m 20s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | javac | 17m 20s | | the patch passed | | +1 :green_heart: | checkstyle | 2m 39s | | the patch passed | | +1 :green_heart: | mvnsite | 2m 36s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 15m 13s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 1m 56s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 2m 29s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | findbugs | 5m 8s | | the patch passed | |||| _ Other Tests _ | | +1 :green_heart: | unit | 9m 33s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 2m 35s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | asflicense | 0m 54s | | The patch does not generate ASF License warnings. | | | | 192m 7s | | | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2557/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2557 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 67ae9387d358 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4c033bafa02 | | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2557/1/testReport/ | | Max. process+thread count | 3249 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs-client U: . | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2557/1/console | | versions | git=2.17.1 maven=3.6.0 findbugs=4.0.6 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 525628) Time Spent: 1h (was: 50m) > EC client will not retry get block token when block token expired in > kerberized cluster > ---------------------------------------------------------------------------------------- > > Key: HDFS-15732 > URL: https://issues.apache.org/jira/browse/HDFS-15732 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding > Affects Versions: 3.1.1 > Environment: hadoop 3.1.1 > kerberos > ec RS-3-2-1024k > Reporter: gaozhan ding > Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > When enable ec policy on hbase, we got some issues. Once block token was > expired in datanode side, client side will not identify the InvalidToken > error because of the SASL negotiation. As a result, ec client will not do > retry by refetch token when create blockreader. Then the peer datanode was > added to DeadNodes, and all calls to function createBlockReader aim at this > datanode in current DFSStripedInputStream will consider this datanode was > dead and return false. The finally result is a read failure. > Some logs : > hbase regionserver: > 2020-12-17 10:00:24,291 WARN > [RpcServer.default.FPBQ.Fifo.handler=15,queue=0,port=16020] hdfs.DFSClient: > Failed to connect to /10.65.19.41:9866 for > blockBP-1601568648-10.65.19.12-1550823043026:blk_-9223372036813273566_672859566 > java.io.IOException: DIGEST-MD5: IO error acquiring password > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:421) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:479) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:393) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:267) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:215) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:647) > at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2936) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:821) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:647) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:272) > at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:333) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:365) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(DFSStripedInputStream.java:514) > at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1354) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1318) > at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:92) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.positionalReadWithExtra(HFileBlock.java:808) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1568) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:340) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:856) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:806) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:327) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:228) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:395) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:250) > at > org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2031) > at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2022) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:6408) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:6388) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2926) > > > datanode log: > 2020-12-17 10:00:24,290 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > hadoop-btdn0005.eniot.io:9866:DataXceiver error processing unknown operation > src: /10.65.19.42:53894 dst: /10.65.19.41:9866 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password > [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: Block > token with block_token_identifier (expiryDate=1608192045256, > keyId=1449537289, userId=hbase, > blockPoolId=BP-1601568648-10.65.19.12-1550823043026, > blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK, > DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd, > DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a, > DS-a1ee0117-7430-4181-b279-3f1b16541567, > DS-e8459904-e1e1-44a6-affa-9ce9fc56f877, > DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired.] > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:598) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslParticipant.evaluateChallengeOrResponse(SaslParticipant.java:115) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:376) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:300) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:127) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Block > token with block_token_identifier (expiryDate=1608192045256, > keyId=1449537289, userId=hbase, > blockPoolId=BP-1601568648-10.65.19.12-1550823043026, > blockId=-9223372036813273566, access modes=[READ], storageTypes= [DISK, DISK, > DISK, DISK, DISK], storageIds= [DS-604a9eaf-94ba-4127-b70d-436361c49ecd, > DS-b53bf503-84c1-4e9e-aede-a19f88e3fc9a, > DS-a1ee0117-7430-4181-b279-3f1b16541567, > DS-e8459904-e1e1-44a6-affa-9ce9fc56f877, > DS-9f54936c-96cc-4c85-91e6-a6f8e00f246a]) is expired. > at > org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.retrievePassword(BlockTokenSecretManager.java:479) > at > org.apache.hadoop.hdfs.security.token.block.BlockPoolTokenSecretManager.retrievePassword(BlockPoolTokenSecretManager.java:81) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.buildServerPassword(SaslDataTransferServer.java:318) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.access$100(SaslDataTransferServer.java:73) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$2.apply(SaslDataTransferServer.java:297) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer$SaslServerCallbackHandler.handle(SaslDataTransferServer.java:241) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589) > ... 7 more > > final result > > > java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for > reader > reader=hdfs://azbeta/hbase/data/o15427722038191/RAW_569dfc35-10dc-4fe4-bb8c-5c403368d6c1/ccf146fb5c852d751b6b2f8307f81fa0/d/386bdb6978d348f4b4e168e728762842, > compression=snappy, cacheConf=blockCache=LruBlockCache\{blockCount=7915, > currentSize=489.13 MB, freeSize=2.71 GB, maxSize=3.19 GB, heapSize=489.13 MB, > minSize=3.03 GB, minFactor=0.95, multiSize=1.52 GB, multiFactor=0.5, > singleSize=775.81 MB, singleFactor=0.25}, cacheDataOnRead=true, > cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, > cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false, > firstKey=Optional[\x01\x00\x00\x13\xBD\x00\x00\x00T\x070\x97\xF7/d:arm_xingneng_test_model@AI1/1603445985858/Put/seqid=0], > > lastKey=Optional[\x01\x00\x00\x16\xF9\x00\x00\x00T\x073-8/d:arm_xingneng_test_model@AI99/1603446144699/Put/seqid=0], > avgKeyLen=53, avgValueLen=4, entries=2016557, length=22957603, > cur=\x01\x00\x00\x14#\x00\x00\x00T\x072\xCC\xF7/d:arm_xingneng_test_model@AI84/1603446131146/Put/vlen=4/seqid=17230816] > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:217) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:120) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:653) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:388) > at > org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:327) > at > org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65) > at > org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126) > at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1410) > at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2187) > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:596) > at > org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:638) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: 3 missing blocks, the stripe is: > AlignedStripe(Offset=1048576, length=1048576, fetchedChunksNum=0, > missingChunksNum=3); locatedBlocks is: LocatedBlocks\{; fileLength=22957603; > underConstruction=false; > blocks=[LocatedStripedBlock{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750; > getBlockSize()=22957603; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK], > > DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK], > > DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK], > > DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK], > > DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]]; > indices=[0, 1, 2, 3, 4]}]; > lastLocatedBlock=LocatedStripedBlock\{BP-1197414916-10.27.20.30-1535978156945:blk_-9223372036830849056_317784750; > getBlockSize()=22957603; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[10.27.20.42:9866,DS-d0d5a7ce-8280-45fe-b910-1e5c7b579367,DISK], > > DatanodeInfoWithStorage[10.27.22.86:9866,DS-b018c729-66d5-4953-94f1-ec7664a46cb7,DISK], > > DatanodeInfoWithStorage[10.27.22.79:9866,DS-d79c402a-1845-4b3f-893e-f84d94085b2a,DISK], > > DatanodeInfoWithStorage[10.27.20.41:9866,DS-b07ba1e0-9a34-4caf-ad51-6d5f302c08ea,DISK], > > DatanodeInfoWithStorage[10.27.20.39:9866,DS-8a059d87-f122-4fed-a6a2-e45662692305,DISK]]; > indices=[0, 1, 2, 3, 4]}; isLastBlockComplete=true; > ecPolicy=ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2]} > at > org.apache.hadoop.hdfs.StripeReader.checkMissingBlocks(StripeReader.java:177) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:372) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:324) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:397) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:820) > at java.io.DataInputStream.read(DataInputStream.java:149) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:765) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1562) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1772) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1597) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1496) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:931) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1064) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1058) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1076) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1097) > at > org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:208) > ... 13 more -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org