Pratyush Bhatt created HDDS-10649:
-------------------------------------
Summary: [LeaseRecovery] Auto Lease recovery failed when Hard
limit is expired.
Key: HDDS-10649
URL: https://issues.apache.org/jira/browse/HDDS-10649
Project: Apache Ozone
Issue Type: Bug
Components: OM
Reporter: Pratyush Bhatt
Below are the hard limit and related configs:
{code:java}
ozone getconf -confKey ozone.om.lease.hard.limit
8m
ozone getconf -confKey ozone.om.open.key.cleanup.service.interval
5m
ozone getconf -confKey ozone.om.open.key.expire.threshold
6m{code}
Created a file {_}/hsyncvol/hsyncbuck/hsync/File_0.txt{_}, wrote some data into
it, did hsync and then kept it open. Final modification was done at
_2024-04-04T16:12:39_
{code:java}
{
"volumeName" : "hsyncvol",
"bucketName" : "hsyncbuck",
"name" : "hsync/File_0.txt",
"dataSize" : 26214400,
"creationTime" : "2024-04-04T16:12:38.263Z",
"modificationTime" : "2024-04-04T16:12:39.660Z",
"replicationConfig" : {
"replicationFactor" : "THREE",
"requiredNodes" : 3,
"replicationType" : "RATIS"
},
"metadata" : {
"hsyncClientId" : "112213829764055054"
},
"ozoneKeyLocations" : [ {
"containerID" : 11,
"localID" : 113750153625603015,
"length" : 26214400,
"offset" : 0,
"keyOffset" : 0
} ],
"file" : true
} {code}
It has been more than a hour and still the file is in OpenKeyTable
{code:java}
> date
Thu Apr 4 17:22:06 UTC 2024
> ozone admin om lof --service-id=ozone1712158888 --prefix=/hsyncvol/hsyncbuck/
0 total open files (est.). Showing 1 open files (limit 100) under path prefix:
/hsyncvol/hsyncbuck/Client ID Creation time Hsync'ed Open File
Path
112213829764055054 1712247158263 Yes
/hsyncvol/hsyncbuck/-9223372036851973887/File_0.txt
Reached the end of the list. {code}
Checked the OM leader logs, there are periodic logs like below every 5 mins
{code:java}
2024-04-04 17:18:17,437 ERROR [om74-OMStateMachineApplyTransactionThread -
0]-org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest: Key committed
failed. Volume:hsyncvol, Bucket:hsyncbuck, Key:File_0.txt. Exception:{}
KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to
commit key, as
/-9223372036851974912/-9223372036851974400/-9223372036851974400/File_0.txt/112213829764055054
entry is not found in the OpenKey table
at
org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequestWithFSO.validateAndUpdateCache(OMKeyCommitRequestWithFSO.java:163)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
at
org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
.
.
.
2024-04-04 17:23:17,436 ERROR [om74-OMStateMachineApplyTransactionThread -
0]-org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest: Key committed
failed. Volume:hsyncvol, Bucket:hsyncbuck, Key:File_0.txt. Exception:{}
KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to
commit key, as
/-9223372036851974912/-9223372036851974400/-9223372036851974400/File_0.txt/112213829764055054
entry is not found in the OpenKey table
at
org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequestWithFSO.validateAndUpdateCache(OMKeyCommitRequestWithFSO.java:163)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
at
org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
at
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
.
.
.
. {code}
cc: [~weichiu] , [~Sammi] [~ashishk]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]