[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

Lukasz Gajowy (JIRA) Wed, 07 Aug 2019 03:22:37 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901955#comment-16901955
 ]


Lukasz Gajowy commented on BEAM-6923:
-------------------------------------

[~angoenka] I added a new screenshot with size sorted heap dump. However, for 
future investigation, I think screenshots are not the best way here. The issue 
can be easily reproduced + you can add:
{code:java}
"-XX:+HeapDumpOnOutOfMemoryError"{code}
to the jvm options in 
[flink_job_server.gradle|https://github.com/apache/beam/blob/ff0f308fb83056bd2ba990de2edec33a0c6c7720/runners/flink/job-server/flink_job_server.gradle#L112]
 file. It will generate the whole heap dump once the error appears. Once you 
have that you can use jvisualvm tool from JDK to investigate (I just learned 
that! :) ). See more 
[here|https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/clopts.html#gbzrr]

 

As for the pipelines - I encountered the error while running performance tests 
of core beam operations. I tried 
[GroupByKeyLoadTest|https://github.com/apache/beam/blob/e0cbd2aa7e371c75511544ab78075d54f3f086ca/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/GroupByKeyLoadTest.java]
 and 
[ParDoLoadTest|https://github.com/apache/beam/blob/e0cbd2aa7e371c75511544ab78075d54f3f086ca/sdks/java/testing/load-tests/src/main/java/org/apache/beam/sdk/loadtests/ParDoLoadTest.java].
 Example command:
{code:java}
./gradlew --continue --max-workers=12 -Dorg.gradle.jvmargs=-Xms2g 
-Dorg.gradle.jvmargs=-Xmx4g 
-PloadTest.mainClass=org.apache.beam.sdk.loadtests.ParDoLoadTest 
'-PloadTest.args=--sourceOptions={"numRecords":100,"keySizeBytes":1,"valueSizeBytes":9}
 --iterations=1 --runner=PortableRunner --jobEndpoint=localhost:8099 
--defaultEnvironmentType=DOCKER 
--defaultEnvironmentConfig=gcr.io/apache-beam-testing/beam/java:latest' 
:beam-sdks-java-load-tests:run -Prunner=":runners:reference:java{code}
As far as I understand, [~marcelo.castro] is running something completely 
different on Spark and still gets the error.

 

> OOM errors in jobServer when using GCS artifactDir
> --------------------------------------------------
>
>                 Key: BEAM-6923
>                 URL: https://issues.apache.org/jira/browse/BEAM-6923
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-harness
>            Reporter: Lukasz Gajowy
>            Priority: Major
>         Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, heapdump size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io/<my-freshly-built-sdk-harness-image>/java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

Reply via email to