[
https://issues.apache.org/jira/browse/BEAM-12191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325207#comment-17325207
]
Yunqing Zhou commented on BEAM-12191:
-------------------------------------
There's a workaround to the issue:
# Generate the template with --experiment=upload_graph.
# Trim the template with the following script:
{code:python}
#!/usr/bin/env python3
# Usage: ./trim.py gs://<bucket>/original_template_file
gs://<bucket>/trimmed_template_file
import subprocess
import sys
import json
import os
file_in = sys.argv[1]
file_out = sys.argv[2]
subprocess.check_call('gsutil cp %s /tmp/template.json' % file_in, shell=True)
with open('/tmp/template.trimmed.json', 'w') as f:
template_obj = json.load(open('/tmp/template.json'))
template_obj['steps'] = []
template_obj['stepsLocation'] =
os.path.join(template_obj['environment']['sdkPipelineOptions']['options']['staging_location'],
'dataflow_graph.json')
json.dump(template_obj, f, indent=True)
subprocess.check_call('gsutil cp /tmp/template.trimmed.json %s' % file_out,
shell=True)
{code}
> python DataflowRunner upload_graph feature doesn't reduce template file size
> ----------------------------------------------------------------------------
>
> Key: BEAM-12191
> URL: https://issues.apache.org/jira/browse/BEAM-12191
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Yunqing Zhou
> Priority: P2
>
> This is the python version of https://issues.apache.org/jira/browse/BEAM-7797,
>
> upload_graph Trimming happened after the template file dump:
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py]
> !https://screenshot.googleplex.com/yYppcL6ZkbUXYFu.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)