[
https://issues.apache.org/jira/browse/BEAM-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973245#comment-15973245
]
Tobias Feldhaus edited comment on BEAM-1997 at 4/18/17 6:33 PM:
----------------------------------------------------------------
You are correct, I've posted the wrong screenshots, sorry. I did run it with
the mentioned number of files though, I just uncompressed the gzip files for
the test runs in the end to save time. Nevertheless I will rerun it again to
take new correct screenshots and while doing that I will already move out the
{{ParseIntoJson}}. :)
was (Author: james-woods):
You are correct, I've posted the wrong screenshots, sorry. I will rerun it and
post correct ones. I did run it with the mentioned number of files though. I've
uncompressed the gzip files for the test runs in the end to save time.
Nevertheless while doing that I will already move out the {{ParseIntoJson}}.
> Scaling Problem of Beam (size of the serialized JSON representation of the
> pipeline exceeds the allowable limit)
> ----------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-1997
> URL: https://issues.apache.org/jira/browse/BEAM-1997
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 0.6.0
> Reporter: Tobias Feldhaus
> Assignee: Daniel Halperin
>
> After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does
> no longer run with 180 output days (BigQuery partitions as sinks), but only
> 60 output days. If using a larger number with Beam the response from the
> Cloud Dataflow service reads as follows:
> {code}
> Failed to create a workflow job: The size of the serialized JSON
> representation of the pipeline exceeds the allowable limit. For more
> information, please check the FAQ link below:
> {code}
> This is the pipeline in dataflow:
> https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
> The resulting graph in Dataflow looks like this:
> https://puu.sh/vhWAW/a12f3246a1.png
> This is the same pipeline in beam:
> https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c
> The constructed graph looks somewhat different:
> https://puu.sh/vhWvm/78a40d422d.png
> Methods used are taken from this example
> https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)