Renukaprasad C created HDFS-16013:
-------------------------------------
Summary: DirectoryScan operation holds dataset lock for long time
Key: HDFS-16013
URL: https://issues.apache.org/jira/browse/HDFS-16013
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Renukaprasad C
Environment: 3 Node cluster with around 2M files & same number of blocks.
All file operations are normal, only during directory scan, which take more
memory and some long GC Pause. This directory scan happens for every 6H
(default value) which cause slow response to any file operations. Delay is
around 5-8 seconds (In production this delay got increased to 30+ seconds with
8M blocks)
GC Configuration:
-Xms6144M
-Xmx12288M /8G
-XX:NewSize=614M
-XX:MaxNewSize=1228M
-XX:MetaspaceSize=128M
-XX:MaxMetaspaceSize=128M
-XX:CMSFullGCsBeforeCompaction=1
-XX:MaxDirectMemorySize=1G
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:+UseCMSCompactAtFullCollection
-XX:CMSInitiatingOccupancyFraction=80
Also we tried with G1 GC, but couldnt find much difference in the result.
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=45
-XX:G1ReservePercent=10
{code:java}
2021-05-07 16:32:23,508 INFO
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
BP-345634799-<IP>-1619695417333 Total blocks: 2767211, missing metadata files:
22, missing block files: 22, missing blocks in memory: 0, mismatched blocks: 0
2021-05-07 16:32:23,508 WARN
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock held
time above threshold: lock identifier: FsDatasetRWLock lockHeldTimeMs=7061 ms.
Suppressed 0 lock warnings. The stack trace is:
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:539)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416)
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:359)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
{code}
We have the following Jiras our code already. But still facing long lock held.
- https://issues.apache.org/jira/browse/HDFS-15621,
https://issues.apache.org/jira/browse/HDFS-15150,
https://issues.apache.org/jira/browse/HDFS-15160,
https://issues.apache.org/jira/browse/HDFS-13947
cc: [~brahma] [~belugabehr] [~sodonnell] [~ayushsaxena] [~weichiu]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]