[
https://issues.apache.org/jira/browse/HIVE-26495?focusedWorklogId=803407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-803407
]
ASF GitHub Bot logged work on HIVE-26495:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Aug/22 00:05
Start Date: 25/Aug/22 00:05
Worklog Time Spent: 10m
Work Description: nareshpr opened a new pull request, #3549:
URL: https://github.com/apache/hive/pull/3549
What changes were proposed in this pull request?
change fs.listStatus to fs.listStatusIterator for non-blocking execution
Why are the changes needed?
MSCK repair is taking long time to validate multi-level partitionFolders in
s3
Does this PR introduce any user-facing change?
No
How was this patch tested?
TestHiveMetaStoreChecker testCode covers the code changes
Issue Time Tracking
-------------------
Worklog Id: (was: 803407)
Time Spent: 50m (was: 40m)
> MSCK repair perf issue HMSChecker ThreadPool is blocked at fs.listStatus
> ------------------------------------------------------------------------
>
> Key: HIVE-26495
> URL: https://issues.apache.org/jira/browse/HIVE-26495
> Project: Hive
> Issue Type: Bug
> Reporter: Naresh P R
> Assignee: Naresh P R
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> With hive.metastore.fshandler.threads = 15, all 15 *MSCK-GetPaths-xx* are
> slogging at following trace.
> {code:java}
> "MSCK-GetPaths-11" #12345 daemon prio=5 os_prio=0 tid= nid= waiting on
> condition [0x00007f9f099a6000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000003f92d1668> (a
> java.util.concurrent.CompletableFuture$Signaller)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> ...
> at org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:3230)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1995)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:550)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:543)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:525)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750){code}
> We should take advantage of non-block listStatusIterator instead of
> listStatus which is a blocking call.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)