[jira] [Resolved] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

Anu Engineer (JIRA) Thu, 06 Sep 2018 16:02:32 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anu Engineer resolved HDDS-296.
-------------------------------
    Resolution: Implemented

Fixed via 355,356,357,358...

> OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan
> -----------------------------------------------------------------------------
>
>                 Key: HDDS-296
>                 URL: https://issues.apache.org/jira/browse/HDDS-296
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Elek, Marton
>            Assignee: Anu Engineer
>            Priority: Critical
>             Fix For: 0.2.1
>
>         Attachments: local.png
>
>
> We identified the problem during freon tests on real clusters. First I saw it 
> on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while 
> the rate of the key allocation was slowed down. (See the attached image).
> I could also reproduce the problem with local cluster (I used the 
> hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys 
> the key creation is almost stopped.
> With the help of [~nandakumar131] we identified the problem is the lock in 
> the ozone manager. (We profiled the OM with visual vm and found that the code 
> is locked for an extremity long time, also checked the rocksdb/rpc metrics 
> from prometheus and everything else was worked well.
> [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. 
> With a custom build we identified that the problem is that the deletion 
> service holds the OMMetadataManager lock for a full range scan. For 1 million 
> keys it took about 10 seconds (with my local developer machine + ssd)
> {code}
> ozoneManager_1  | 2018-07-25 12:45:03 WARN  OMMetadataManager:143 - Lock held 
> time above threshold: lock identifier: OMMetadataManagerLock 
> lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> ozoneManager_1  | 
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> ozoneManager_1  | 
> org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98)
> ozoneManager_1  | 
> org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ozoneManager_1  | java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> ozoneManager_1  | 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ozoneManager_1  | 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ozoneManager_1  | java.lang.Thread.run(Thread.java:748)
> {code}
> I checked it with disabled DeletionService and worked well.
> Deletion service should be improved to make it work without long term locking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HDDS-296) OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan

Reply via email to