[jira] [Comment Edited] (HIVE-15879) Fix HiveMetaStoreChecker.checkPartitionDirs method

Rajesh Balamohan (JIRA) Thu, 23 Feb 2017 18:54:07 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881827#comment-15881827
 ]


Rajesh Balamohan edited comment on HIVE-15879 at 2/24/17 2:53 AM:
------------------------------------------------------------------

Thanks for sharing the details [~vihangk1]. I have a different point of view 
here.

I agree that ThreadPoolExecutor.getActiveCount() is approximate. It is 
approximate because,  by the time {{getActiveCount()}} iterates over the 
running threads in the worker list, it is possible that some of the threads 
which were executing are complete. 
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ThreadPoolExecutor.java#l1818.
 So the reported numbers could be slightly higher than the actually running 
threads. But it would never be less, as new Worker in ThreadPoolExecutor is 
added with {{mainLock}}. 

In the context of MSCK logic, this approximation should not be a problem. 

This is due to the check of "(pool.getActiveCount() < 
pool.getMaximumPoolSize())". In case threadpool executor reports approximate 
value (i.e higher than the actual number of threads), thread pool would not be 
used as per current logic. So in corner cases there can be instances where in 
threadpool executor could have been used, but failed due to the approximate 
(higher values)  reported by ThreadPoolExecutor. 


was (Author: rajesh.balamohan):
Thanks for sharing the details [~vihangk1]. I have a different point of view 
here.

I agree that ThreadPoolExecutor.getActiveCount() is approximate. It is 
approximate because,  by the time {{getActiveCount()}} iterates over the 
running threads in the worker list, it is possible that some of the threads 
which were executing are complete. 
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/concurrent/ThreadPoolExecutor.java#l1818.
 So the reported numbers could be slightly higher than the actually running 
threads. But it would never be less, as new Worker in ThreadPoolExecutor is 
added with {{mainLock}}. 

In the context of MSCK logic, this approximation should not be a problem. This 
is due to the check of "(pool.getActiveCount() < pool.getMaximumPoolSize())". 
In case threadpool executor reports approximate value (i.e higher than the 
actual number of threads), thread pool would not be used as per current logic. 
So in corner cases there can be instances where in threadpool executor could 
have been used, but failed due to the approximate (higher values)  reported by 
ThreadPoolExecutor. 

> Fix HiveMetaStoreChecker.checkPartitionDirs method
> --------------------------------------------------
>
>                 Key: HIVE-15879
>                 URL: https://issues.apache.org/jira/browse/HIVE-15879
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>         Attachments: HIVE-15879.01.patch
>
>
> HIVE-15803 fixes the msck hang issue in 
> HiveMetaStoreChecker.checkPartitionDirs method by adding a check to see if 
> the Threadpool has any spare threads. If not it uses single threaded listing 
> of the files.
> {noformat}
>     if (pool != null) {
>       synchronized (pool) {
>         // In case of recursive calls, it is possible to deadlock with TP. 
> Check TP usage here.
>         if (pool.getActiveCount() < pool.getMaximumPoolSize()) {
>           useThreadPool = true;
>         }
>         if (!useThreadPool) {
>           if (LOG.isDebugEnabled()) {
>             LOG.debug("Not using threadPool as active count:" + 
> pool.getActiveCount()
>                 + ", max:" + pool.getMaximumPoolSize());
>           }
>         }
>       }
>     }
> {noformat}
> Based on the java doc of getActiveCount() below 
> bq. Returns the approximate number of threads that are actively executing 
> tasks.
> it returns only approximate number of threads and it cannot be guaranteed 
> that it always returns the exact number of active threads. This still exposes 
> the method implementation to the msck hang bug in rare corner cases.
> We could either:
> 1. Use a atomic counter to track exactly how many threads are actively running
> 2. Relook at the method itself to make it much simpler. Like eg, look into 
> the possibility of changing the recursive implementation to an iterative 
> implementation where worker threads pick tasks from a queue until the queue 
> is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15879) Fix HiveMetaStoreChecker.checkPartitionDirs method

Reply via email to