[ https://issues.apache.org/jira/browse/BEAM-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974777#comment-15974777 ]
Tobias Feldhaus commented on BEAM-1997: --------------------------------------- Mea culpa, it seems like I've had more than one file per day, leading to a 3-4 times larger pipeline, this explains the problem. > Scaling Problem of Beam (size of the serialized JSON representation of the > pipeline exceeds the allowable limit) > ---------------------------------------------------------------------------------------------------------------- > > Key: BEAM-1997 > URL: https://issues.apache.org/jira/browse/BEAM-1997 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Affects Versions: 0.6.0 > Reporter: Tobias Feldhaus > Assignee: Daniel Halperin > > After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does > no longer run with 180 output days (BigQuery partitions as sinks), but only > 60 output days. If using a larger number with Beam the response from the > Cloud Dataflow service reads as follows: > {code} > Failed to create a workflow job: The size of the serialized JSON > representation of the pipeline exceeds the allowable limit. For more > information, please check the FAQ link below: > {code} > This is the pipeline in dataflow: > https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f > The resulting graph in Dataflow looks like this: > https://puu.sh/vhWAW/a12f3246a1.png > This is the same pipeline in beam: > https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c > The constructed graph looks somewhat different: > https://puu.sh/vhWvm/78a40d422d.png > Methods used are taken from this example > https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6 -- This message was sent by Atlassian JIRA (v6.3.15#6346)