[ 
https://issues.apache.org/jira/browse/BEAM-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332005#comment-17332005
 ] 

Tim Robertson commented on BEAM-10670:
--------------------------------------

Following update to 2.28.0 the GBIF pipelines 
([https://github.com/gbif/pipelines)] started to have issue. This affected the 
production pipelines on GBIF.org running on spark/Yarn and those for ala.org.au 
which run on Spark standalone on AWS.

I'm told that they observed significant memory increase, slower performance, 
and failing jobs with "GC overhead exceeded" for simple AvroIO to AvroIO 
pipelines. Both now run with the use_deprecated_read option.

We can help running tests but I think the cause needs to be fixed before the 
legacy functionality is removed.

> Make non-portable Splittable DoFn the only option when executing Java "Read" 
> transforms
> ---------------------------------------------------------------------------------------
>
>                 Key: BEAM-10670
>                 URL: https://issues.apache.org/jira/browse/BEAM-10670
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Luke Cwik
>            Priority: P3
>              Labels: Clarified
>          Time Spent: 37h 50m
>  Remaining Estimate: 0h
>
> All runners seem to be capable of migrating to splittable DoFn for 
> non-portable execution except for Dataflow runner v1 which will internalize 
> the current primitive read implementation that is shared across runner 
> implementations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to