[
https://issues.apache.org/jira/browse/HDDS-9342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17795178#comment-17795178
]
Sumit Agrawal commented on HDDS-9342:
-------------------------------------
[~Sammi] This is good identification of potential problem.
>From above scenario, when we replay the transaction, db is having latest key,
>but ratis is replay the key creation again. The failure shows the operation is
>+*not idempotent.*+ Still there will be rare chance that, very frequent create
>key - delete key multiple time and crash after db flush but before update
>apply transaction index can cause this.
HDDS-9876 : Can reduce the occurence by making ratis applyTransaction index
updated removing this bug.
Also, *HDDS-9146:* Removed the risk of dataloss as reply of transaction for
commit may cause deletion of blocks thinking latest block in DB as already used.
We need relook considering fix for replay transaction and handling failure,
dataloss if any impact considering above scenario.
cc: [~szetszwo] [~ritesh]
> OM restart failed due to transactionLogIndex smaller than current updateID
> --------------------------------------------------------------------------
>
> Key: HDDS-9342
> URL: https://issues.apache.org/jira/browse/HDDS-9342
> Project: Apache Ozone
> Issue Type: Bug
> Components: OM, OM HA
> Affects Versions: 1.3.0
> Reporter: Hongbing Wang
> Assignee: Sammi Chen
> Priority: Critical
> Attachments: HDDS-9342_testUpdateId.patch,
> HDDS-9342_testUpdateId_reproduce.patch, clipboard_image_1700795744614.png,
> om.shutdown-20230922.log
>
>
> OM restart failed, log as follow:
> create failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863625
> which is not greater than the current value of 2901863627 for
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='user/xxx/platform/xxx',
> dataSize='268435456', creationTime='1695088210914',
> objectID='-9223371293977687808', parentID='0', replication='RATIS/THREE',
> fileChecksum='null}
> at
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareFileInfo(OMKeyRequest.java:665)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRequest.prepareKeyInfo(OMKeyRequest.java:623)
> at
> org.apache.hadoop.ozone.om.request.file.OMFileCreateRequest.validateAndUpdateCache(OMFileCreateRequest.java:255)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> rename failed:
> {noformat}
> java.lang.IllegalArgumentException: Trying to set updateID to 2901863669
> which is not greater than the current value of 3076345041 for
> OMKeyInfo{volume='vol-xxx', bucket='xxx', key='checkative/xxx',
> dataSize='23124', creationTime='1695380440059',
> objectID='-9223371249310446848', parentID='0', replication='RATIS/THREE',
> fileChecksum='null}
> at
> org.apache.hadoop.ozone.om.helpers.WithObjectID.setUpdateID(WithObjectID.java:105)
> at
> org.apache.hadoop.ozone.om.request.key.OMKeyRenameRequest.validateAndUpdateCache(OMKeyRenameRequest.java:190)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRouterRequestHandler.handleWriteRequest(OzoneManagerRouterRequestHandler.java:806)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:535)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:326)
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]