[ 
https://issues.apache.org/jira/browse/BEAM-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974777#comment-15974777
 ] 

Tobias Feldhaus commented on BEAM-1997:
---------------------------------------

Mea culpa, it seems like I've had more than one file per day, leading to a 3-4 
times larger pipeline, this explains the problem. 

> Scaling Problem of Beam (size of the serialized JSON representation of the 
> pipeline exceeds the allowable limit)
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-1997
>                 URL: https://issues.apache.org/jira/browse/BEAM-1997
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 0.6.0
>            Reporter: Tobias Feldhaus
>            Assignee: Daniel Halperin
>
> After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does 
> no longer run with 180 output days (BigQuery partitions as sinks), but only 
> 60 output days. If using a larger number with Beam the response from the 
> Cloud  Dataflow service reads as follows:
> {code}
> Failed to create a workflow job: The size of the serialized JSON 
> representation of the pipeline exceeds the allowable limit. For more 
> information, please check the FAQ link below:
> {code}
> This is the pipeline in dataflow: 
> https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
> The resulting graph in Dataflow looks like this: 
> https://puu.sh/vhWAW/a12f3246a1.png
> This is the same pipeline in beam: 
> https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c
> The constructed graph looks somewhat different:
> https://puu.sh/vhWvm/78a40d422d.png
> Methods used are taken from this example 
> https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to