[jira] [Commented] (HDDS-9342) OM restart failed due to transactionLogIndex smaller than current updateID

Sammi Chen (Jira) Thu, 07 Dec 2023 07:34:05 -0800


    [ 
https://issues.apache.org/jira/browse/HDDS-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794305#comment-17794305
 ]


Sammi Chen commented on HDDS-9342:
----------------------------------

Together with [~wanghongbing] and [~z-bb] , we find the root cause of this.  
There are three major issues shown in this case.

The first is a long full GC which caused OM to terminate itself,
{code:java}
2023-10-24 11:40:08,542 [JvmPauseMonitor0] WARN 
org.apache.ratis.util.JvmPauseMonitor: JvmPauseMonitor-om2: Detected pause in 
JVM or host machine approximately 131.539s with 131.735s GC time.
GC pool 'ParNew' had collection(s): count=1 time=1422ms
GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=130313ms
...
2023-10-24 11:40:08,552 [JvmPauseMonitor0] ERROR 
org.apache.ratis.server.RaftServer: om2: JVM pause detected 131.539s longer 
than the close-threshold 120s, shutting down ... {code}
 
The second is the gap between snapshot index taken during OM shutdown and 
commitIndex in Raft is too huge, which leads to many raft logs need replay in 
next OM start, and which also caused the third issue.


{code:java}
// Some comments here
2023-10-24 11:40:08,557 [om2-impl-thread2] INFO 
org.apache.ratis.server.impl.StateMachineUpdater: 
om2@group-197E298202B9-StateMachineUpdater: set stopIndex = 405823932
...
2023-10-24 11:40:13,773 [om2@group-197E298202B9-StateMachineUpdater] INFO 
org.apache.ratis.server.impl.StateMachineUpdater: 
om2@group-197E298202B9-StateMachineUpdater: Took a snapshot at index 206114616
2023-10-24 11:40:13,776 [om2@group-197E298202B9-StateMachineUpdater] INFO 
org.apache.ratis.server.impl.StateMachineUpdater: 
om2@group-197E298202B9-StateMachineUpdater: snapshotIndex: updateIncreasingly 
205866698 -> 206114616
...
2023-10-24 12:02:58,532 [om2-impl-thread1] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
om2@group-197E298202B9-SegmentedRaftLogWorker: flushIndex: setUnconditionally 0 
-> 405823932
2023-10-24 12:02:58,532 [om2-impl-thread1] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
om2@group-197E298202B9-SegmentedRaftLogWorker: safeCacheEvictIndex: 
setUnconditionally 0 -> 206114616
2023-10-24 12:02:58,619 [om2-impl-thread1] INFO 
org.apache.ratis.server.raftlog.RaftLog: 
om2@group-197E298202B9-SegmentedRaftLog: commitIndex: updateToMax 
old=206114616, new=405823931, updated? true
2023-10-24 12:02:58,619 [om2-impl-thread1] INFO 
org.apache.ratis.server.raftlog.RaftLog: 
om2@group-197E298202B9-SegmentedRaftLog: purgeIndex: updateIncreasingly -1 -> 
206094557
{code}


The third is the OM start failure due to raft log replay failure,
{code:java}
2023-10-24 12:02:59,086 [OM StateMachine ApplyTransaction Thread - 0] ERROR 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Terminating with 
exit status 1: Request cmdType: RenameKey
traceID: ""
clientId: "client-9300289B92FF"
userInfo {
  userName: "weisuonan"
  remoteAddress: "10.83.99.62"
  hostName: "bigdata-apache-tool1.nmg01.diditaxi.com"
}
version: 3
renameKeyRequest {
  keyArgs {
    volumeName: "vol-bigdata-offline"
    bucketName: "weisuonan"
    keyName: "checkative/english-book_Ozone"
    dataSize: 0
    modificationTime: 1697627113034
  }
  toKeyName: 
".Trash/weisuonan/Current/vol-bigdata-offline/weisuonan/checkative/english-book_Ozone1697627113026"
}
 failed with exception
java.lang.IllegalArgumentException: Trying to set updateID to 206114651 which 
is not greater than the current value of 405823931 for 
OMKeyInfo{volume='vol-bigdata-offline', bucket='weisuonan', 
key='checkative/english-book_Ozone', dataSize='23124', 
creationTime='1698118516546', objectID='-9223371932963850495', 
parentID='-9223372036604311807', replication='RATIS/THREE', fileChecksum='null}
    at 
org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:107)
    at 
org.apache.hadoop.ozone.om.request.key.OMKeyRenameRequestWithFSO.renameKey(OMKeyRenameRequestWithFSO.java:286)
    at 
org.apache.hadoop.ozone.om.request.key.OMKeyRenameRequestWithFSO.validateAndUpdateCache(OMKeyRenameRequestWithFSO.java:219)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:382)
    at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:572)
    at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:362)
    at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{code}

After an analysis, all the issues are one root cause.  HDDS-2680 introduced a 
logic in OzoneManagerStateMachine to calculate the lastAppliedTermIndex based 
on two maps, applyTransactionMap and ratisTransactionMap. Any write request 
from RATIS through applyTransaction will add its trxLogIndex into 
applyTransactionMap. And any write request which is flushed by 
OzoneManagerDoubleBuffer#flushBatch will have its trxLogIndex removed from 
applyTransactionMap during flushBatch call 
ozoneManagerRatisSnapShot.updateLastAppliedIndex(flushedEpochs). 

If any write request from RATIS not going through 
OzoneManagerDoubleBuffer#flushBatch, then its trxLogIndex will be left in the 
applyTransactionMap forever.  Since lastApplicedIndex can only be updated 
incrementally, any trxLogIndex not confirmed by OzoneManagerDoubleBuffer flush 
will make the lastApplicedIndex grow stops before it, and although write 
requests after that unconfirmed one could be flushed, but their trxLogIndex 
will be added to the ratisTransactionMap, which causes the ratisTransactionMap 
grow bigger and bigger.  These explains the full GC, the huge gap between 
snapshot index and commit index. 

How a write request will not be confirmed by OzoneManagerDoubleBuffer flush?  
Here is one case reproduced locally. 
T1: create bucket1
T2: client1 sends delete bucket "bucket1" request to OM.  OM verify bucket1 
exists, then send request to RATIS to handle the request.
T3: client2 sends create key "bucket1/key1" request to OM. OM verify bucket2 
exists, then send request to RATIS
T4: OzoneManagerStateMachine executes delete bucket "bucket1" success, return 
response to client1
T5: OzoneManagerStateMachine executes create key "bucket1/key1" request, 
"bucket1" cannot be found, execution fails, return failure to client2 

In T5, the failure stack is 

{code:java}
2023-10-18 19:04:10,131 [OM StateMachine ApplyTransaction Thread - 0] WARN 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine: Failed to write, 
Exception occurred 
BUCKET_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Bucket not 
found: s3v/prod-voyager
        at 
org.apache.hadoop.ozone.om.OzoneManagerUtils.reportNotFound(OzoneManagerUtils.java:87)
        at 
org.apache.hadoop.ozone.om.OzoneManagerUtils.getBucketInfo(OzoneManagerUtils.java:72)
        at 
org.apache.hadoop.ozone.om.OzoneManagerUtils.resolveBucketInfoLink(OzoneManagerUtils.java:148)
        at 
org.apache.hadoop.ozone.om.OzoneManagerUtils.getResolvedBucketInfo(OzoneManagerUtils.java:124)
        at 
org.apache.hadoop.ozone.om.OzoneManagerUtils.getBucketLayout(OzoneManagerUtils.java:106)
        at 
org.apache.hadoop.ozone.om.request.BucketLayoutAwareOMKeyRequestFactory.createRequest(BucketLayoutAwareOMKeyRequestFactory.java:230)
        at 
org.apache.hadoop.ozone.om.ratis.utils.OzoneManagerRatisUtils.createClientRequest(OzoneManagerRatisUtils.java:336)
        at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:380)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:572)
        at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:362)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

In OzoneManagerStateMachine.runCommand, when IOException is throw out from 
OzoneManagerRequestHandler.handleWriteRequest, it constructs and returns 
OMResponse to client, it doesn't add the response into 
OzoneManagerDoubleBuffer, so OzoneManagerDoubleBuffer doesn't aware of this 
request and its trxLogIndex. The consequence is this trxLogIndex will be stay 
in applyTransactionMap forever.  The failed OM logs confirmed that. 





> OM restart failed due to transactionLogIndex smaller than current updateID
> --------------------------------------------------------------------------
>
>                 Key: HDDS-9342
>                 URL: https://issues.apache.org/jira/browse/HDDS-9342
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM, OM HA
>    Affects Versions: 1.3.0
>            Reporter: Hongbing Wang
>            Priority: Critical
>         Attachments: HDDS-9342_testUpdateId.patch, 
> HDDS-9342_testUpdateId_reproduce.patch, clipboard_image_1700795744614.png, 
> om.shutdown-20230922.log
>
>
> OM restart failed, log as follow:
> create failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863625 
> which is not greater than the current value of 2901863627 for 
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='user/xxx/platform/xxx', 
> dataSize='268435456', creationTime='1695088210914', 
> objectID='-9223371293977687808', parentID='0', replication='RATIS/THREE', 
> fileChecksum='null}
>         at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareFileInfo(OMKeyRequest.java:665)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareKeyInfo(OMKeyRequest.java:623)
>         at 
> org.apache.hadoop.ozone.om.request.file.OMFileCreateRequest.validateAndUpdateCache(OMFileCreateRequest.java:255)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> rename failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863669 
> which is not greater than the current value of 3076345041 for 
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='checkative/xxx', 
> dataSize='23124', creationTime='1695380440059', 
> objectID='-9223371249310446848', parentID='0', replication='RATIS/THREE', 
> fileChecksum='null}
>         at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRenameRequest.validateAndUpdateCache(OMKeyRenameRequest.java:190)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-9342) OM restart failed due to transactionLogIndex smaller than current updateID

Reply via email to