[
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16902012#comment-16902012
]
Marcelo Pio de Castro edited comment on BEAM-6923 at 8/7/19 11:57 AM:
----------------------------------------------------------------------
Sorry for the late response.
[~iemejia], the library was guava-20, but at the time I was using a mismatched
version of beam and spark (spark was in a earlier version than the dependency
of Apache beam). I'll retest this with the newer version of beam and get back
to you.
As for the thread issue. I was uploading files that could have 300+ GB file
size using the AvroIO sink. At the time I had a very limited cluster of spark
workers, with about 4GB of RAM. The problem is at the async http upload that
the gcs lib uses.
I think that an option should exist to choose to use a sync method instead on
limited cluster memory scenarios. (If such option exist, I couldn't find it)
I solved my problem doing the upload by myself in a ParDo instead of using the
AvroIO sink, by uploading in a sync way. Even an async method can be done with
a more limited ram, but that is a problem with the Google library.
was (Author: marcelo.castro):
Sorry for the late response.
[~iemejia], the library was guava-20, but at the time I was using a mismatched
version of beam and spark (spark was in a earlier version than the dependency
of Apache beam). I'll retest this with the newer version of beam and get back
to you.
As for the thread issue. I was uploading files that could 300+ GB file size
using the AvroIO sink. At the time I had a very limited cluster of spark
workers, with about 4GB of RAM. The problem is at the async http upload that
the gcs lib uses.
I think that an option should exist to choose to use a sync method instead on
limited cluster memory scenarios. (If such option exist, I couldn't find it)
I solved my problem doing the upload by myself in a ParDo instead of using the
AvroIO sink, by uploading in a sync way. Even an async method can be done with
a more limited ram, but that is a problem with the Google library.
> OOM errors in jobServer when using GCS artifactDir
> --------------------------------------------------
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-harness
> Reporter: Lukasz Gajowy
> Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png,
> Telemetries.png, heapdump size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket:
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related
> pipeline options:
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099
> --defaultEnvironmentType=DOCKER
> --defaultEnvironmentConfig=gcr.io/<my-freshly-built-sdk-harness-image>/java:latest'{code}
>
> I'm facing a series of OOM errors, like this:
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError:
> Java heap space
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>
> This does not happen when I'm using a local filesystem for the artifact
> staging location.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)