[ 
https://issues.apache.org/jira/browse/BEAM-12191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325207#comment-17325207
 ] 

Yunqing Zhou commented on BEAM-12191:
-------------------------------------

There's a workaround to the issue:
 # Generate the template with --experiment=upload_graph.
 # Trim the template with the following script:


{code:python}
 #!/usr/bin/env python3

# Usage: ./trim.py gs://<bucket>/original_template_file 
gs://<bucket>/trimmed_template_file

import subprocess
import sys
import json
import os

file_in = sys.argv[1]
file_out = sys.argv[2]

subprocess.check_call('gsutil cp %s /tmp/template.json' % file_in, shell=True)
with open('/tmp/template.trimmed.json', 'w') as f:
  template_obj = json.load(open('/tmp/template.json'))
  template_obj['steps'] = []
  template_obj['stepsLocation'] = 
os.path.join(template_obj['environment']['sdkPipelineOptions']['options']['staging_location'],
 'dataflow_graph.json')
  json.dump(template_obj, f, indent=True)
subprocess.check_call('gsutil cp /tmp/template.trimmed.json %s' % file_out, 
shell=True)
{code}



> python DataflowRunner upload_graph feature doesn't reduce template file size
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-12191
>                 URL: https://issues.apache.org/jira/browse/BEAM-12191
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Yunqing Zhou
>            Priority: P2
>
> This is the python version of https://issues.apache.org/jira/browse/BEAM-7797,
>  
> upload_graph Trimming happened after the template file dump:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py]
> !https://screenshot.googleplex.com/yYppcL6ZkbUXYFu.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to