[ 
https://issues.apache.org/jira/browse/HDDS-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDDS-10536.
------------------------------------
    Resolution: Duplicate

Probably fixed by HDDS-10497

> [Hbase Ozone] BLOCK_TOKEN_VERIFICATION failed while long running YCSB
> ---------------------------------------------------------------------
>
>                 Key: HDDS-10536
>                 URL: https://issues.apache.org/jira/browse/HDDS-10536
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM
>            Reporter: Pratyush Bhatt
>            Assignee: Sammi Chen
>            Priority: Major
>
> YCSB Fails after 4 days(As the command was triggered around _2024-03-12 
> 17:49:28_ and failed at around {_}2024-03-16 07:43:39{_}), with 
> [INSERT-FAILED] underlying issue  being a _BLOCK_TOKEN_VERIFICATION_ issue.
> _Error in client side:_
> {code:java}
> 2024-03-16 07:43:29,233|INFO|Thread-36|machine.py:205 - 
> run()||GUID=16aa1f97-ce35-4680-82a3-b8ac527277b2|2024-03-16 00:43:29:224 
> 309240 sec: 7052507 operations; 8.9 current ops/sec; est completion in 16 
> days 17 hours [INSERT: Count=89, Max=2588671, Min=17104, Avg=404562.07, 
> 90=2418687, 99=2545663, 99.9=2588671, 99.99=2588671]
> 2024-03-16 07:43:33,361|INFO|Thread-36|machine.py:205 - 
> run()||GUID=16aa1f97-ce35-4680-82a3-b8ac527277b2|Error inserting, not 
> retrying any more. number of attempts: 1Insertion Retry Limit: 0
> 2024-03-16 07:43:39,245|INFO|Thread-36|machine.py:205 - 
> run()||GUID=16aa1f97-ce35-4680-82a3-b8ac527277b2|2024-03-16 00:43:39:224 
> 309250 sec: 7052548 operations; 4.1 current ops/sec; est completion in 16 
> days 17 hours [CLEANUP: Count=2, Max=268, Min=35, Avg=151.5, 90=268, 99=268, 
> 99.9=268, 99.99=268] [INSERT: Count=41, Max=28426239, Min=24736, 
> Avg=1247846.24, 90=2533375, 99=28426239, 99.9=28426239, 99.99=28426239] 
> [INSERT-FAILED: Count=1, Max=38436863, Min=38404096, Avg=38420480, 
> 90=38436863, 99=38436863, 99.9=38436863, 99.99=38436863]
> 2024-03-16 07:46:22,355|INFO|Thread-37|test_lrt_hbase_ozone.py:165 - 
> open_files_helper()|Sleeping for 5 minutes. {code}
> Checked the Hbase Master logs of the same time(Cluster is in PDT TZ, and QE 
> pod is in UTC)
> {code:java}
> 2024-03-16 00:40:53,855 ERROR org.apache.hadoop.hdds.scm.XceiverClientGrpc: 
> Failed to execute command ReadChunk on the pipeline Pipeline[ Id: 
> 781daaaf-6183-45fd-9f1b-f0be538247be, Nodes: 
> 0da8ff95-6d92-4956-83db-dee85e97488e(vc0121.example.com/10.17.207.31)a19eff5a-6feb-417b-b433-62a5f5af80bc(vc0127.example.com/10.17.207.37)b9d66e9e-8c2b-49ff-bf07-3fd3ea146a7f(vc0123.example.com/10.17.207.33),
>  ReplicationConfig: STANDALONE/THREE, State:OPEN, 
> leaderId:0da8ff95-6d92-4956-83db-dee85e97488e, 
> CreationTimestamp2024-03-12T06:12:35.741-07:00[America/Los_Angeles]].
> 2024-03-16 00:40:53,855 WARN 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls: Failed to read 
> chunk 113750153625707608_chunk_0 (len=12266) conID: 3060 locID: 
> 113750153625707608 bcsId: 2782589 from 
> 0da8ff95-6d92-4956-83db-dee85e97488e(vc0121.example.com/10.17.207.31); will 
> try another datanode.
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  BLOCK_TOKEN_VERIFICATION_FAILED for null: Expired token for user: hbase 
> (auth:SIMPLE)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.validateContainerResponse(ContainerProtocolCalls.java:675)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$createValidators$4(ContainerProtocolCalls.java:686)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithRetry(XceiverClientGrpc.java:400)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.lambda$sendCommandWithTraceIDAndRetry$0(XceiverClientGrpc.java:340)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:159)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommandWithTraceIDAndRetry(XceiverClientGrpc.java:335)
>         at 
> org.apache.hadoop.hdds.scm.XceiverClientGrpc.sendCommand(XceiverClientGrpc.java:316)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:358)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.lambda$readChunk$2(ContainerProtocolCalls.java:345)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.tryEachDatanode(ContainerProtocolCalls.java:147)
>         at 
> org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.readChunk(ContainerProtocolCalls.java:344)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunk(ChunkInputStream.java:425)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkDataIntoBuffers(ChunkInputStream.java:402)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.readChunkFromContainer(ChunkInputStream.java:387)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.prepareRead(ChunkInputStream.java:319)
>         at 
> org.apache.hadoop.hdds.scm.storage.ChunkInputStream.read(ChunkInputStream.java:173)
>         at 
> org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
>         at 
> org.apache.hadoop.hdds.scm.storage.BlockInputStream.readWithStrategy(BlockInputStream.java:367)
>         at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
>         at 
> org.apache.hadoop.hdds.scm.storage.ByteArrayReader.readFromBlock(ByteArrayReader.java:54)
>         at 
> org.apache.hadoop.hdds.scm.storage.MultipartInputStream.readWithStrategy(MultipartInputStream.java:96)
>         at 
> org.apache.hadoop.hdds.scm.storage.ExtendedInputStream.read(ExtendedInputStream.java:56)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSInputStream.read(OzoneFSInputStream.java:64)
>         at org.apache.hadoop.fs.FSInputStream.read(FSInputStream.java:78)
>         at 
> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:97)
>         at 
> org.apache.hadoop.hbase.io.util.BlockIOUtils.preadWithExtra(BlockIOUtils.java:233)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readAtOffset(HFileBlock.java:1494)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1717)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1528)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1322)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1242)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:318)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:664)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:617)
>  {code}
> Can see the ticket was valid in the host(As I mentioned before in the 
> cluster's krb5.conf lifetime is set as 1d and renew as 8d and in our 
> background thread the renew happens on every 8 hour and when we would have 
> got the ticket it would have been 03/20/24 10:49:28 (Renew until limit) - 8 
> days(Renew) i.e. _03/12/24 10:49:00_ so _kinit -R_ seems to have happened 
> about 12 times:
> {code:java}
> [root@vc0122 ~]# klist
> Ticket cache: FILE:/tmp/krb5cc_0Default principal: hb...@example.com
> Valid starting     Expires            Service principal 
> 03/16/24 10:49:28  03/17/24 10:49:28  krbtgt/example....@example.com  renew 
> until 03/20/24 10:49:28 {code}
> cc: [~weichiu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to