Pratyush Bhatt created HDDS-10649:
-------------------------------------

             Summary: [LeaseRecovery] Auto Lease recovery failed when Hard 
limit is expired.
                 Key: HDDS-10649
                 URL: https://issues.apache.org/jira/browse/HDDS-10649
             Project: Apache Ozone
          Issue Type: Bug
          Components: OM
            Reporter: Pratyush Bhatt


Below are the hard limit and related configs:
{code:java}
ozone getconf -confKey ozone.om.lease.hard.limit
8m
ozone getconf -confKey ozone.om.open.key.cleanup.service.interval
5m
ozone getconf -confKey ozone.om.open.key.expire.threshold
6m{code}

Created a file {_}/hsyncvol/hsyncbuck/hsync/File_0.txt{_}, wrote some data into 
it, did hsync and then kept it open. Final modification was done at 
_2024-04-04T16:12:39_
{code:java}
{
  "volumeName" : "hsyncvol",
  "bucketName" : "hsyncbuck",
  "name" : "hsync/File_0.txt",
  "dataSize" : 26214400,
  "creationTime" : "2024-04-04T16:12:38.263Z",
  "modificationTime" : "2024-04-04T16:12:39.660Z",
  "replicationConfig" : {
    "replicationFactor" : "THREE",
    "requiredNodes" : 3,
    "replicationType" : "RATIS"
  },
  "metadata" : {
    "hsyncClientId" : "112213829764055054"
  },
  "ozoneKeyLocations" : [ {
    "containerID" : 11,
    "localID" : 113750153625603015,
    "length" : 26214400,
    "offset" : 0,
    "keyOffset" : 0
  } ],
  "file" : true
} {code}
It has been more than a hour and still the file is in OpenKeyTable
{code:java}
> date
Thu Apr  4 17:22:06 UTC 2024

> ozone admin om lof --service-id=ozone1712158888  --prefix=/hsyncvol/hsyncbuck/
0 total open files (est.). Showing 1 open files (limit 100) under path prefix:
  /hsyncvol/hsyncbuck/Client ID        Creation time    Hsync'ed    Open File 
Path
112213829764055054    1712247158263    Yes        
/hsyncvol/hsyncbuck/-9223372036851973887/File_0.txt
Reached the end of the list. {code}

Checked the OM leader logs, there are periodic logs like below every 5 mins
{code:java}
2024-04-04 17:18:17,437 ERROR [om74-OMStateMachineApplyTransactionThread - 
0]-org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest: Key committed 
failed. Volume:hsyncvol, Bucket:hsyncbuck, Key:File_0.txt. Exception:{}
KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to 
commit key, as 
/-9223372036851974912/-9223372036851974400/-9223372036851974400/File_0.txt/112213829764055054
 entry is not found in the OpenKey table
    at 
org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequestWithFSO.validateAndUpdateCache(OMKeyCommitRequestWithFSO.java:163)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
    at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
    at 
org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
    at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
    at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
    at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
.
.
.

2024-04-04 17:23:17,436 ERROR [om74-OMStateMachineApplyTransactionThread - 
0]-org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequest: Key committed 
failed. Volume:hsyncvol, Bucket:hsyncbuck, Key:File_0.txt. Exception:{}
KEY_NOT_FOUND org.apache.hadoop.ozone.om.exceptions.OMException: Failed to 
commit key, as 
/-9223372036851974912/-9223372036851974400/-9223372036851974400/File_0.txt/112213829764055054
 entry is not found in the OpenKey table
    at 
org.apache.hadoop.ozone.om.request.key.OMKeyCommitRequestWithFSO.validateAndUpdateCache(OMKeyCommitRequestWithFSO.java:163)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.lambda$0(OzoneManagerRequestHandler.java:406)
    at org.apache.hadoop.util.MetricUtil.captureLatencyNs(MetricUtil.java:45)
    at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequestImpl(OzoneManagerRequestHandler.java:404)
    at 
org.apache.hadoop.ozone.protocolPB.RequestHandler.handleWriteRequest(RequestHandler.java:63)
    at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:525)
    at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:343)
    at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

.
.
.
. {code}
cc: [~weichiu] , [~Sammi] [~ashishk] 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to