[ 
https://issues.apache.org/jira/browse/HDDS-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796215#comment-17796215
 ] 

Sumit Agrawal commented on HDDS-9342:
-------------------------------------

[~Sammi] 

IMO for solution of replay transaction,
 * Need keep last applied transaction in DB as DB flush
 * when apply transaction for transaction Index lesser than one from DB, just 
return success reply with no-ops (and a warn log of skipping the transaction as 
its already applied to DB)

This will help to avoid replay of samething again to DB.

 

Related to lastAppliedTermIndex update,
 * We can just update lastApplied Transaction based on DB flush id (as truth as 
applied) only
 * can remove any update for lastAppliedTransaction from ratis, can be used to 
log warn if different from DB last applied and ratis crosses certain threshold 
to notify the difference pending or metrics
 * can remove ratisTransactionMap as seems do not have any usages

 

 

I think based on this, no other changes required and can resolve the problem as 
faced above, and simplify the logic.

cc: [~szetszwo] 

> OM restart failed due to transactionLogIndex smaller than current updateID
> --------------------------------------------------------------------------
>
>                 Key: HDDS-9342
>                 URL: https://issues.apache.org/jira/browse/HDDS-9342
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: OM, OM HA
>    Affects Versions: 1.3.0
>            Reporter: Hongbing Wang
>            Assignee: Sammi Chen
>            Priority: Critical
>         Attachments: HDDS-9342_testUpdateId.patch, 
> HDDS-9342_testUpdateId_reproduce.patch, clipboard_image_1700795744614.png, 
> om.shutdown-20230922.log
>
>
> OM restart failed, log as follow:
> create failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863625 
> which is not greater than the current value of 2901863627 for 
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='user/xxx/platform/xxx', 
> dataSize='268435456', creationTime='1695088210914', 
> objectID='-9223371293977687808', parentID='0', replication='RATIS/THREE', 
> fileChecksum='null}
>         at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareFileInfo(OMKeyRequest.java:665)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareKeyInfo(OMKeyRequest.java:623)
>         at 
> org.apache.hadoop.ozone.om.request.file.OMFileCreateRequest.validateAndUpdateCache(OMFileCreateRequest.java:255)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> rename failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863669 
> which is not greater than the current value of 3076345041 for 
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='checkative/xxx', 
> dataSize='23124', creationTime='1695380440059', 
> objectID='-9223371249310446848', parentID='0', replication='RATIS/THREE', 
> fileChecksum='null}
>         at 
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
>         at 
> org.apache.hadoop.ozone.om.request.key.OMKeyRenameRequest.validateAndUpdateCache(OMKeyRenameRequest.java:190)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
>         at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
>         at 
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
>         at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to