[ https://issues.apache.org/jira/browse/HDDS-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang resolved HDDS-10536. ------------------------------------ Resolution: Duplicate Probably fixed by HDDS-10497 > [Hbase Ozone] BLOCK_TOKEN_VERIFICATION failed while long running YCSB > --------------------------------------------------------------------- > > Key: HDDS-10536 > URL: https://issues.apache.org/jira/browse/HDDS-10536 > Project: Apache Ozone > Issue Type: Bug > Components: OM > Reporter: Pratyush Bhatt > Assignee: Sammi Chen > Priority: Major > > YCSB Fails after 4 days(As the command was triggered around _2024-03-12 > 17:49:28_ and failed at around {_}2024-03-16 07:43:39{_}), with > [INSERT-FAILED] underlying issue being a _BLOCK_TOKEN_VERIFICATION_ issue. > _Error in client side:_ > {code:java} > 2024-03-16 07:43:29,233|INFO|Thread-36|machine.py:205 - > run()||GUID=16aa1f97-ce35-4680-82a3-b8ac527277b2|2024-03-16 00:43:29:224 > 309240 sec: 7052507 operations; 8.9 current ops/sec; est completion in 16 > days 17 hours [INSERT: Count=89, Max=2588671, Min=17104, Avg=404562.07, > 90=2418687, 99=2545663, 99.9=2588671, 99.99=2588671] > 2024-03-16 07:43:33,361|INFO|Thread-36|machine.py:205 - > run()||GUID=16aa1f97-ce35-4680-82a3-b8ac527277b2|Error inserting, not > retrying any more. number of attempts: 1Insertion Retry Limit: 0 > 2024-03-16 07:43:39,245|INFO|Thread-36|machine.py:205 - > run()||GUID=16aa1f97-ce35-4680-82a3-b8ac527277b2|2024-03-16 00:43:39:224 > 309250 sec: 7052548 operations; 4.1 current ops/sec; est completion in 16 > days 17 hours [CLEANUP: Count=2, Max=268, Min=35, Avg=151.5, 90=268, 99=268, > 99.9=268, 99.99=268] [INSERT: Count=41, Max=28426239, Min=24736, > Avg=1247846.24, 90=2533375, 99=28426239, 99.9=28426239, 99.99=28426239] > [INSERT-FAILED: Count=1, Max=38436863, Min=38404096, Avg=38420480, > 90=38436863, 99=38436863, 99.9=38436863, 99.99=38436863] > 2024-03-16 07:46:22,355|INFO|Thread-37|test_lrt_hbase_ozone.py:165 - > open_files_helper()|Sleeping for 5 minutes. {code} > Checked the Hbase Master logs of the same time(Cluster is in PDT TZ, and QE > pod is in UTC) > {code:java} > 2024-03-16 00:40:53,855 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: > Failed to execute command ReadChunk on the pipeline Pipeline[ Id: > 781daaaf-6183-45fd-9f1b-f0be538247be, Nodes: > 0da8ff95-6d92-4956-83db-dee85e97488e(vc0121.example.com/10.17.207.31)a19eff5a-6feb-417b-b433-62a5f5af80bc(vc0127.example.com/10.17.207.37)b9d66e9e-8c2b-49ff-bf07-3fd3ea146a7f(vc0123.example.com/10.17.207.33), > ReplicationConfig: STANDALONE/THREE, State:OPEN, > leaderId:0da8ff95-6d92-4956-83db-dee85e97488e, > CreationTimestamp2024-03-12T06:12:35.741-07:00[America/Los_Angeles]]. > 2024-03-16 00:40:53,855 WARN > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read > chunk 113750153625707608_chunk_0 (len=12266) conID: 3060 locID: > 113750153625707608 bcsId: 2782589 from > 0da8ff95-6d92-4956-83db-dee85e97488e(vc0121.example.com/10.17.207.31); will > try another datanode. > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase > (auth:SIMPLE) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340) > at > org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159) > at > org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335) > at > org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147) > at > org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:425) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkDataIntoBuffers(ChunkInputStream.java:402) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:387) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:319) > at > org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:173) > at > org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54) > at > org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:367) > at > org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56) > at > org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54) > at > org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:96) > at > org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56) > at > org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:64) > at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:78) > at > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:97) > at > org.apache.hadoop.hbase.io.util.BlockIOUtils.preadWithExtra(BlockIOUtils.java:233) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1494) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1717) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1528) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1322) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1242) > at > org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:318) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:664) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:617) > {code} > Can see the ticket was valid in the host(As I mentioned before in the > cluster's krb5.conf lifetime is set as 1d and renew as 8d and in our > background thread the renew happens on every 8 hour and when we would have > got the ticket it would have been 03/20/24 10:49:28 (Renew until limit) - 8 > days(Renew) i.e. _03/12/24 10:49:00_ so _kinit -R_ seems to have happened > about 12 times: > {code:java} > [root@vc0122 ~]# klist > Ticket cache: FILE:/tmp/krb5cc_0Default principal: hb...@example.com > Valid starting Expires Service principal > 03/16/24 10:49:28 03/17/24 10:49:28 krbtgt/example....@example.com renew > until 03/20/24 10:49:28 {code} > cc: [~weichiu] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org