[ 
https://issues.apache.org/jira/browse/HDDS-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796994#comment-17796994
 ] 

Sumit Agrawal commented on HDDS-9342:
-------------------------------------

[~szetszwo] 

As discussed, already we keep lastAppliedTransaction in db, and get loaded 
during startup. So replay problem must not be there.

Checking code for snapshot, its found that we overwrite the 
lastAppliedTransaction in DB from memory (which can be old):

org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine#takeSnapshot

 

IMO, this seems to be the problem of this issue, that its started with old 
transactionId then that from DB as expected. while take snapshot, it should not 
use one from memory.

 

Also related to logic of update lastTransactionId as disccussed [~szetszwo] , 
it should only take from DB as last flushed epoch, no need go in sequential 
order to simplify the logic.

 

cc: [~Sammi] 

> OM restart failed due to transactionLogIndex smaller than current updateID
> --------------------------------------------------------------------------
>
>                 Key: HDDS-9342
>                 URL: https://issues.apache.org/jira/browse/HDDS-9342
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM, OM HA
>    Affects Versions: 1.3.0
>            Reporter: Hongbing Wang
>            Assignee: Sammi Chen
>            Priority: Critical
>         Attachments: HDDS-9342_testUpdateId.patch, 
> HDDS-9342_testUpdateId_reproduce.patch, clipboard_image_1700795744614.png, 
> om.shutdown-20230922.log
>
>
> OM restart failed, log as follow:
> create failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863625 
> which is not greater than the current value of 2901863627 for 
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='user/xxx/platform/xxx', 
> dataSize='268435456', creationTime='1695088210914', 
> objectID='-9223371293977687808', parentID='0', replication='RATIS/THREE', 
> fileChecksum='null}
>         at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareFileInfo(OMKeyRequest.java:665)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareKeyInfo(OMKeyRequest.java:623)
>         at 
> org.apache.hadoop.ozone.om.request.file.OMFileCreateRequest.validateAndUpdateCache(OMFileCreateRequest.java:255)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> rename failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863669 
> which is not greater than the current value of 3076345041 for 
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='checkative/xxx', 
> dataSize='23124', creationTime='1695380440059', 
> objectID='-9223371249310446848', parentID='0', replication='RATIS/THREE', 
> fileChecksum='null}
>         at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRenameRequest.validateAndUpdateCache(OMKeyRenameRequest.java:190)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to