[
https://issues.apache.org/jira/browse/HIVE-16949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Birger Brunswiek updated HIVE-16949:
------------------------------------
Description:
The commit
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
which was part of HIVE-15546 [introduced a thread
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
which is not shutdown upon completion of its threads. This leads to a leak of
threads for each query which uses more than 1 partition. They are not removed
automatically. When queries spanning multiple partitions are made the number of
threads increases and is never reduced. On my machine hiveserver2 starts to get
slower and slower once 10k threads are reached.
Thread pools only shutdown automatically in special circumstances (see
[documentation section
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
This is not currently the case for the Get-Input-Paths thread pool. I would
add a _pool.shutdown()_ in a finally block just before returning the result to
make sure the threads are really shutdown.
My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}.
This prevents the the thread pool from being spawned
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
The same issue probably also applies to the [Get-Input-Summary thread
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].
was:
The commit
[20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
which was part of HIVE-15546 [introduced a thread
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
which is not shutdown upon completion of its threads. This leads to a leak of
threads for each query which uses more than 1 partition. They are not removed
by the GC. When queries spanning multiple partitions are made the number of
threads increases and is never reduced. On my machine hiveserver2 starts to get
slower and slower once 10k threads are reached.
Thread pools only shutdown automatically in special circumstances (see
[documentation section
_Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
This is not currently the case for the Get-Input-Paths thread pool. I would
add a _pool.shutdown()_ in a finally block just before returning the result to
make sure the threads are really shutdown.
My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}.
This prevents the the thread pool from being spawned
[\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
[\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
The same issue probably also applies to the [Get-Input-Summary thread
pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].
> Leak of threads from Get-Input-Paths thread pool when more than 1 used in
> query
> -------------------------------------------------------------------------------
>
> Key: HIVE-16949
> URL: https://issues.apache.org/jira/browse/HIVE-16949
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2
> Reporter: Birger Brunswiek
>
> The commit
> [20210de|https://github.com/apache/hive/commit/20210dec94148c9b529132b1545df3dd7be083c3]
> which was part of HIVE-15546 [introduced a thread
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3109]
> which is not shutdown upon completion of its threads. This leads to a leak
> of threads for each query which uses more than 1 partition. They are not
> removed automatically. When queries spanning multiple partitions are made the
> number of threads increases and is never reduced. On my machine hiveserver2
> starts to get slower and slower once 10k threads are reached.
> Thread pools only shutdown automatically in special circumstances (see
> [documentation section
> _Finalization_|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html]).
> This is not currently the case for the Get-Input-Paths thread pool. I would
> add a _pool.shutdown()_ in a finally block just before returning the result
> to make sure the threads are really shutdown.
> My current workaround is to set {{hive.exec.input.listing.max.threads = 1}}.
> This prevents the the thread pool from being spawned
> [\[1\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2118]
>
> [\[2\]|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L3107].
> The same issue probably also applies to the [Get-Input-Summary thread
> pool|https://github.com/apache/hive/blob/824b9c80b443dc4e2b9ad35214a23ac756e75234/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2193].
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)