[ 
https://issues.apache.org/jira/browse/HDDS-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pratyush Bhatt updated HDDS-11374:
----------------------------------
    Summary: RegionServer Crashed with "java.io.InterruptedIOException"  (was: 
RegionServer Crashed with "java.io.InterruptedIOExceptio")

> RegionServer Crashed with "java.io.InterruptedIOException"
> ----------------------------------------------------------
>
>                 Key: HDDS-11374
>                 URL: https://issues.apache.org/jira/browse/HDDS-11374
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Pratyush Bhatt
>            Priority: Major
>
> Ran on a setup with HDDS-9844 and the fix for KeyoutputStream HDDS-11239
> ozone.client.key.write.concurrency is set to -1 for unlimited concurrency.
> Triggered Bulkload and LoadTestTool on the cluster with Hbase on Ozone.
> The RegionServer Crashed abruptly.
> Logs just before the crash:
> {code:java}
> 2024-08-27 05:17:27,551 ERROR 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Error syncing, request close 
> of WAL
> java.io.InterruptedIOException
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:602)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.hsync(KeyOutputStream.java:550)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.hsync(OzoneOutputStream.java:118)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:184)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.hsync(OzoneFSOutputStream.java:80)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.hflush(OzoneFSOutputStream.java:75)
>         at 
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:136)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:84)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:669)
> 2024-08-27 05:17:27,552 WARN 
> org.apache.hadoop.ozone.client.io.KeyOutputStream: Encountered exception 
> java.io.IOException: Unexpected Storage Container Exception: 
> java.lang.InterruptedException on the pipeline Pipeline[ Id: 
> bdab774b-98bf-4f51-8986-ad27889b186a, Nodes: 
> c02f4835-7103-4c1c-94b7-65f7d0ce02bd(vc0133.xyz/10.17.207.43) ReplicaIndex: 
> 0319a1376-3f35-44a4-85e7-c364b9cb03f9(vc0132.xyz/10.17.207.42) ReplicaIndex: 
> 080eebcdd-5257-4a30-98ce-c53943ba1016(vc0136.xyz/10.17.207.46) ReplicaIndex: 
> 0, ReplicationConfig: RATIS/THREE, State:OPEN, 
> leaderId:319a1376-3f35-44a4-85e7-c364b9cb03f9, 
> CreationTimestamp2024-08-27T03:26:53.789-07:00[America/Los_Angeles]]. The 
> last committed block length is 2362474, uncommitted data length is 0 retry 
> count 0 {code}
> Logs during the crash:
> {code:java}
> 2024-08-27 05:17:27,554 ERROR 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog: Error syncing, request close 
> of WAL
> java.io.InterruptedIOException: Interrupted waiting to send RPC request to 
> server
>         at org.apache.hadoop.ipc.Client.call(Client.java:1517)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1459)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>         at com.sun.proxy.$Proxy25.submitRequest(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>         at com.sun.proxy.$Proxy25.submitRequest(Unknown Source)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransport.submitRequest(Hadoop3OmTransport.java:80)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:343)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.updateKey(OzoneManagerProtocolClientSideTranslatorPB.java:862)
>         at 
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.hsyncKey(OzoneManagerProtocolClientSideTranslatorPB.java:807)
>         at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.lambda$hsyncKey$1(BlockOutputStreamEntryPool.java:359)
>         at 
> org.apache.hadoop.ozone.util.MetricUtil.captureLatencyNs(MetricUtil.java:78)
>         at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntryPool.hsyncKey(BlockOutputStreamEntryPool.java:358)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.lambda$null$3(KeyOutputStream.java:556)
>         at 
> org.apache.hadoop.ozone.util.MetricUtil.captureLatencyNs(MetricUtil.java:78)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.lambda$hsync$4(KeyOutputStream.java:555)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.doInWriteLock(KeyOutputStream.java:260)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.hsync(KeyOutputStream.java:552)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.hsync(OzoneOutputStream.java:118)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:184)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.hsync(OzoneFSOutputStream.java:80)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.hflush(OzoneFSOutputStream.java:75)
>         at 
> org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:136)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:84)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:669)
> Caused by: java.lang.InterruptedException
>         at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>         at 
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1220)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1512)
>         ... 33 more
> 2024-08-27 05:17:27,554 INFO 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry: 
> BlockOutputStreamEntry:hbase/WALs/vc0139.xyz,22101,1724755172412/vc0139.xyz%2C22101%2C1724755172412.vc0139.xyz%2C22101%2C1724755172412.regiongroup-3.1724761021452
>  conID: 3070 locID: 113750153625747794 bcsId: 0 replicaIndex: null : Done 
> waiting for retry handling.
> 2024-08-27 05:17:27,555 INFO 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL: Slow sync cost: 332 
> ms, current pipeline: []
> 2024-08-27 05:17:27,686 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ***** ABORTING region 
> server vc0139.xyz,22101,1724755172412: Failed log close in log roller *****
> org.apache.hadoop.hbase.regionserver.wal.FailedLogCloseException: 
> ofs://ozone1724523753/hbasevol322708/hbasebuck/hbase/WALs/vc0139.xyz,22101,1724755172412/vc0139.xyz%2C22101%2C1724755172412.vc0139.xyz%2C22101%2C1724755172412.regiongroup-4.1724761011264,
>  unflushedEntries=1
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.doReplaceWriter(FSHLog.java:428)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.doReplaceWriter(FSHLog.java:71)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.replaceWriter(AbstractFSWAL.java:830)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:889)
>         at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller$RollController.rollWal(AbstractWALRoller.java:304)
>         at 
> org.apache.hadoop.hbase.wal.AbstractWALRoller.run(AbstractWALRoller.java:211)
> Caused by: 
> org.apache.hadoop.hbase.regionserver.wal.FailedSyncBeforeLogCloseException: 
> org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append 
> sequenceId=25700, requesting roll of WAL
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SafePointZigZagLatch.checkIfSyncFailed(FSHLog.java:908)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SafePointZigZagLatch.waitSafePoint(FSHLog.java:924)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog.doReplaceWriter(FSHLog.java:368)
>         ... 5 more
> Caused by: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: 
> Append sequenceId=25700, requesting roll of WAL
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1208)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1081)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:982)
>         at 
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168)
>         at 
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: : Stream is closed! Key: 
> hbase/WALs/vc0139.xyz,22101,1724755172412/vc0139.xyz%2C22101%2C1724755172412.vc0139.xyz%2C22101%2C1724755172412.regiongroup-4.1724761011264
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.checkNotClosed(KeyOutputStream.java:856)
>         at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:236)
>         at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:94)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.lambda$write$1(OzoneFSOutputStream.java:58)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:184)
>         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:149)
>         at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:54)
>         at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:62)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3041)
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.flushIfNotAvailable(CodedOutputStream.java:3036)
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeUInt64(CodedOutputStream.java:2726)
>         at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.WALProtos$WALKey.writeTo(WALProtos.java:1887)
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:95)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:55)
>  {code}
> cc: [~duongnguyen] [~weichiu] [~ashishkr] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to