[ https://issues.apache.org/jira/browse/HDFS-16013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340866#comment-17340866 ]
Stephen O'Donnell commented on HDFS-16013: ------------------------------------------ You want these ones: HDFS-15406. Improve the speed of Datanode Block Scan. Contributed by hemanthboyina HDFS-15574. Remove unnecessary sort of block list in DirectoryScanner. Contributed by Stephen O'Donnell. HDFS-15583. Backport DirectoryScanner improvements HDFS-14476, HDFS-14751 and HDFS-15048 to branch 3.2 and 3.1. Contributed by Stephen O'Donnell HDFS-15415. Reduce locking in Datanode DirectoryScanner. Contributed by Stephen O'Donnell The last one is key, as it removes the lock completely, but the first one above makes a big different to the speed too. > DirectoryScan operation holds dataset lock for long time > -------------------------------------------------------- > > Key: HDFS-16013 > URL: https://issues.apache.org/jira/browse/HDFS-16013 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Renukaprasad C > Priority: Critical > > Environment: 3 Node cluster with around 2M files & same number of blocks. > All file operations are normal, only during directory scan, which take more > memory and some long GC Pause. This directory scan happens for every 6H > (default value) which cause slow response to any file operations. Delay is > around 5-8 seconds (In production this delay got increased to 30+ seconds > with 8M blocks) > GC Configuration: > -Xms6144M > -Xmx12288M /8G > -XX:NewSize=614M > -XX:MaxNewSize=1228M > -XX:MetaspaceSize=128M > -XX:MaxMetaspaceSize=128M > -XX:CMSFullGCsBeforeCompaction=1 > -XX:MaxDirectMemorySize=1G > -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled > -XX:+UseCMSCompactAtFullCollection > -XX:CMSInitiatingOccupancyFraction=80 > Also we tried with G1 GC, but couldnt find much difference in the result. > -XX:+UseG1GC > -XX:MaxGCPauseMillis=200 > -XX:InitiatingHeapOccupancyPercent=45 > -XX:G1ReservePercent=10 > {code:java} > 2021-05-07 16:32:23,508 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-345634799-<IP>-1619695417333 Total blocks: 2767211, missing metadata > files: 22, missing block files: 22, missing blocks in memory: 0, mismatched > blocks: 0 > 2021-05-07 16:32:23,508 WARN > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Lock > held time above threshold: lock identifier: FsDatasetRWLock > lockHeldTimeMs=7061 ms. Suppressed 0 lock warnings. The stack trace is: > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) > org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78) > org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84) > org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:539) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:416) > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:359) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > {code} > We have the following Jiras our code already. But still facing long lock > held. - https://issues.apache.org/jira/browse/HDFS-15621, > https://issues.apache.org/jira/browse/HDFS-15150, > https://issues.apache.org/jira/browse/HDFS-15160, > https://issues.apache.org/jira/browse/HDFS-13947 > cc: [~brahma] [~belugabehr] [~sodonnell] [~ayushsaxena] [~weichiu] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org