[
https://issues.apache.org/jira/browse/BEAM-7412?focusedWorklogId=250730&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-250730
]
ASF GitHub Bot logged work on BEAM-7412:
----------------------------------------
Author: ASF GitHub Bot
Created on: 30/May/19 05:50
Start Date: 30/May/19 05:50
Worklog Time Spent: 10m
Work Description: ibzib commented on pull request #8722: [BEAM-7412] shut
down executor service in fn harness
URL: https://github.com/apache/beam/pull/8722
I was somewhat surprised to find that the unbounded thread growth I
encountered when running portable validates runner tests on local Spark is was
actually caused by
[`GcsOptions.java`](https://github.com/apache/beam/blob/39dab79b7bd56a47c0f2a02033ab4ca3bd4a67e2/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/options/GcsOptions.java#L150-L156).
I'm guessing this hasn't caused problems before because the SDK harness was
always run in a separately managed process/container/whatever that terminated
when work was complete.
Incidentally, this was a pain to track down because thousands of threads
used for any number of purposes were all uniformly named `pool-N-thread-M` (as
they all used Java's
[`defaultThreadFactory`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#defaultThreadFactory--)).
Going forward, I might look into naming Beam threads so it's easier to debug
and analyze performance.
------------------------
R: @angoenka
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | --- | --- | --- | --- | ---
Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
<br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
| --- | --- | ---
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website
--- | --- | --- | --- | ---
Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 250730)
Time Spent: 10m
Remaining Estimate: 0h
> portable spark: thread/memory leak in local mode
> ------------------------------------------------
>
> Key: BEAM-7412
> URL: https://issues.apache.org/jira/browse/BEAM-7412
> Project: Beam
> Issue Type: Bug
> Components: runner-spark
> Reporter: Kyle Weaver
> Assignee: Kyle Weaver
> Priority: Major
> Attachments: Screenshot from 2019-05-20 14-26-03.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When running validatesPortableRunnerBatch test on local mode, the portable
> Spark runner creates ~18k threads before becoming unable to create any more
> threads, at which point it crashes. This does not happen on standalone
> cluster mode, where ephemeral executors (independent JVMs) are used and then
> shut down, preventing whatever leakage is occurring from becoming too much of
> a problem.
> Sample stack trace:
> {{"pool-3965-thread-3" #16059 daemon prio=5 os_prio=0 tid=0x00007f9f85a81000
> nid=0x32a0 waiting on condition [0x00007f9f8b9f1000]}}
> {{ java.lang.Thread.State: TIMED_WAITING (parking)}}
> {{ at (C/C++) 0x00007fa2f32c2dae (Unknown Source)}}
> {{ at (C/C++) 0x00007fa2f2851351 (Unknown Source)}}
> {{ at sun.misc.Unsafe.park(Native Method)}}
> {{ - parking to wait for <0x0000000727bda848> (a
> java.util.concurrent.SynchronousQueue$TransferStack)}}
> {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}}
> {{ at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)}}
> {{ at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)}}
> {{ at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)}}
> {{ at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)}}
> {{ at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)}}
> {{ at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
> {{ at java.lang.Thread.run(Thread.java:748)}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)