[
https://issues.apache.org/jira/browse/HDDS-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796994#comment-17796994
]
Sumit Agrawal commented on HDDS-9342:
-------------------------------------
[~szetszwo]
As discussed, already we keep lastAppliedTransaction in db, and get loaded
during startup. So replay problem must not be there.
Checking code for snapshot, its found that we overwrite the
lastAppliedTransaction in DB from memory (which can be old):
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine#takeSnapshot
IMO, this seems to be the problem of this issue, that its started with old
transactionId then that from DB as expected. while take snapshot, it should not
use one from memory.
Also related to logic of update lastTransactionId as disccussed [~szetszwo] ,
it should only take from DB as last flushed epoch, no need go in sequential
order to simplify the logic.
cc: [~Sammi]
> OM restart failed due to transactionLogIndex smaller than current updateID
> --------------------------------------------------------------------------
>
> Key: HDDS-9342
> URL: https://issues.apache.org/jira/browse/HDDS-9342
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM, OM HA
> Affects Versions: 1.3.0
> Reporter: Hongbing Wang
> Assignee: Sammi Chen
> Priority: Critical
> Attachments: HDDS-9342_testUpdateId.patch,
> HDDS-9342_testUpdateId_reproduce.patch, clipboard_image_1700795744614.png,
> om.shutdown-20230922.log
>
>
> OM restart failed, log as follow:
> create failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863625
> which is not greater than the current value of 2901863627 for
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='user/xxx/platform/xxx',
> dataSize='268435456', creationTime='1695088210914',
> objectID='-9223371293977687808', parentID='0', replication='RATIS/THREE',
> fileChecksum='null}
> at
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareFileInfo(OMKeyRequest.java:665)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareKeyInfo(OMKeyRequest.java:623)
> at
> org.apache.hadoop.ozone.om.request.file.OMFileCreateRequest.validateAndUpdateCache(OMFileCreateRequest.java:255)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> rename failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863669
> which is not greater than the current value of 3076345041 for
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='checkative/xxx',
> dataSize='23124', creationTime='1695380440059',
> objectID='-9223371249310446848', parentID='0', replication='RATIS/THREE',
> fileChecksum='null}
> at
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRenameRequest.validateAndUpdateCache(OMKeyRenameRequest.java:190)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]