[
https://issues.apache.org/jira/browse/BEAM-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tobias Feldhaus resolved BEAM-1997.
-----------------------------------
Resolution: Invalid
Fix Version/s: 0.6.0
> Scaling Problem of Beam (size of the serialized JSON representation of the
> pipeline exceeds the allowable limit)
> ----------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-1997
> URL: https://issues.apache.org/jira/browse/BEAM-1997
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Affects Versions: 0.6.0
> Reporter: Tobias Feldhaus
> Assignee: Daniel Halperin
> Fix For: 0.6.0
>
>
> After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does
> no longer run with 180 output days (BigQuery partitions as sinks), but only
> 60 output days. If using a larger number with Beam the response from the
> Cloud Dataflow service reads as follows:
> {code}
> Failed to create a workflow job: The size of the serialized JSON
> representation of the pipeline exceeds the allowable limit. For more
> information, please check the FAQ link below:
> {code}
> This is the pipeline in dataflow:
> https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
> The resulting graph in Dataflow looks like this:
> https://puu.sh/vhWAW/a12f3246a1.png
> This is the same pipeline in beam:
> https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c
> The constructed graph looks somewhat different:
> https://puu.sh/vhWvm/78a40d422d.png
> Methods used are taken from this example
> https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)