[
https://issues.apache.org/jira/browse/HDFS-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340866#comment-17340866
]
Stephen O'Donnell commented on HDFS-16013:
------------------------------------------
You want these ones:
HDFS-15406. Improve the speed of Datanode Block Scan. Contributed by
hemanthboyina
HDFS-15574. Remove unnecessary sort of block list in DirectoryScanner.
Contributed by Stephen O'Donnell.
HDFS-15583. Backport DirectoryScanner improvements HDFS-14476, HDFS-14751 and
HDFS-15048 to branch 3.2 and 3.1. Contributed by Stephen O'Donnell
HDFS-15415. Reduce locking in Datanode DirectoryScanner. Contributed by Stephen
O'Donnell
The last one is key, as it removes the lock completely, but the first one above
makes a big different to the speed too.
> DirectoryScan operation holds dataset lock for long time
> --------------------------------------------------------
>
> Key: HDFS-16013
> URL: https://issues.apache.org/jira/browse/HDFS-16013
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Renukaprasad C
> Priority: Critical
>
> Environment: 3 Node cluster with around 2M files & same number of blocks.
> All file operations are normal, only during directory scan, which take more
> memory and some long GC Pause. This directory scan happens for every 6H
> (default value) which cause slow response to any file operations. Delay is
> around 5-8 seconds (In production this delay got increased to 30+ seconds
> with 8M blocks)
> GC Configuration:
> -Xms6144M
> -Xmx12288M /8G
> -XX:NewSize=614M
> -XX:MaxNewSize=1228M
> -XX:MetaspaceSize=128M
> -XX:MaxMetaspaceSize=128M
> -XX:CMSFullGCsBeforeCompaction=1
> -XX:MaxDirectMemorySize=1G
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:+UseCMSCompactAtFullCollection
> -XX:CMSInitiatingOccupancyFraction=80
> Also we tried with G1 GC, but couldnt find much difference in the result.
> -XX:+UseG1GC
> -XX:MaxGCPauseMillis=200
> -XX:InitiatingHeapOccupancyPercent=45
> -XX:G1ReservePercent=10
> {code:java}
> 2021-05-07 16:32:23,508 INFO
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool
> BP-345634799-<IP>-1619695417333 Total blocks: 2767211, missing metadata
> files: 22, missing block files: 22, missing blocks in memory: 0, mismatched
> blocks: 0
> 2021-05-07 16:32:23,508 WARN
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock
> held time above threshold: lock identifier: FsDatasetRWLock
> lockHeldTimeMs=7061 ms. Suppressed 0 lock warnings. The stack trace is:
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:539)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:359)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> We have the following Jiras our code already. But still facing long lock
> held. - https://issues.apache.org/jira/browse/HDFS-15621,
> https://issues.apache.org/jira/browse/HDFS-15150,
> https://issues.apache.org/jira/browse/HDFS-15160,
> https://issues.apache.org/jira/browse/HDFS-13947
> cc: [~brahma] [~belugabehr] [~sodonnell] [~ayushsaxena] [~weichiu]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]