[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

Scott Wegner (JIRA) Thu, 11 Oct 2018 11:12:09 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646845#comment-16646845
 ]


Scott Wegner commented on BEAM-5467:
------------------------------------

Thanks for the additional context. I'm not an expert on diagnosing memory 
issues, but here's what I can pull out of there:

* The build scan shows [some stats on memory 
usage|https://scans.gradle.com/s/f2u3q2obrgaqu/performance/build#memory], and 
for this build I see "PS Eden Space" of 1.36/1.36 GB (99.5%). I would deduce 
that the JVM ran out of allotted memory causing the segfault.
* The [infrastructure 
tab|https://scans.gradle.com/s/f2u3q2obrgaqu#infrastructure] shows the "Max JVM 
memory heap size" for the job: 3824 MB
* In the [timeline|https://scans.gradle.com/s/f2u3q2obrgaqu/timeline] I can see 
that the task that failed was 
{{:beam-sdks-python:flinkCompatibilityMatrixBatch}}. Nothing was running 
concurrently as part of the build, so either this task ate up the entire heap 
space, or some previous task is leaking memory.

My recommendation would be to work towards getting a local repro so that you 
can attach a memory profiler and validate potential fixes. The Jenkins job 
shows the full command-line used to launch the job, including JVM memory 
configuration:

{{gradlew --info --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g 
-Dorg.gradle.jvmargs=-Xmx4g :beam-sdks-python:flinkCompatibilityMatrixBatch 
:beam-sdks-python:flinkCompatibilityMatrixStreaming}}

> Python Flink ValidatesRunner job fixes
> --------------------------------------
>
>                 Key: BEAM-5467
>                 URL: https://issues.apache.org/jira/browse/BEAM-5467
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-flink
>            Reporter: Thomas Weise
>            Assignee: Thomas Weise
>            Priority: Minor
>              Labels: portability-flink
>          Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Add status to README
> Rename script and job for consistency
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-5467) Python Flink ValidatesRunner job fixes

Reply via email to