[
https://issues.apache.org/jira/browse/BEAM-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779662#comment-16779662
]
Robert Burke edited comment on BEAM-6745 at 2/27/19 10:15 PM:
--------------------------------------------------------------
There's no *dataflow side* documentation of the SDK, or any statements of
support that I'm aware of. At present, if you use the Go SDK on Dataflow, you
do so at your own risk. While Google does fund some work on Apache Beam.
It's a quirk of portability that it enables "unofficial" language SDK support
on any compatible runner. However, there's nothing guaranteeing this, and
there's no effort to maintain anything around it, (as determined by this bug).
Official support would include be the version of the dataflow library to
provide a compatible, versioned SDK container, without users ever needing to
specify anything, and that tests for certain versions of the SDK run
successfully against the service and similar.
In short, it's a matter of it Can run on dataflow, but not necessarily that
folks should use it.
I'm hoping to be able to change that, but I can't speak to any timelines at
present.
Edit: I think the point I'm trying to make here is that the Go SDK tries to
support Dataflow, but that Dataflow, as a paid service, doesn't support the Go
SDK, as there are certain expectations once money gets involved.
was (Author: lostluck):
There's no *dataflow side* documentation of the SDK, or any statements of
support that I'm aware of. At present, if you use the Go SDK on Dataflow, you
do so at your own risk. While Google does fund some work on Apache Beam.
It's a quirk of portability that it enables "unofficial" language SDK support
on any compatible runner. However, there's nothing guaranteeing this, and
there's no effort to maintain anything around it, (as determined by this bug).
Official support would include be the version of the dataflow library to
provide a compatible, versioned SDK container, without users ever needing to
specify anything, and that tests for certain versions of the SDK run
successfully against the service and similar.
In short, it's a matter of it Can run on dataflow, but not necessarily that
folks should use it.
I'm hoping to be able to change that, but I can't speak to any timelines at
present.
> Cannot run pipeline on Dataflow (GO SDK)
> ----------------------------------------
>
> Key: BEAM-6745
> URL: https://issues.apache.org/jira/browse/BEAM-6745
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow, sdk-go
> Reporter: Michael Chemani
> Priority: Major
>
> I got
> ```
> {{Failed to retrieve staged files: failed to retrieve worker in 3 attempts:
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ;
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ;
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want ;
> bad MD5 for /var/opt/google/staged/worker: d79JZxFttnJG7SPkF30ozA==, want}}
> ```
>
> When trying to run
> ```
> {{dataflow \ --runner dataflow \ --index gs://\{BUCKET}/data_100k.csv \
> --output gs://\{BUCKET}/ \ --project {PROJECT} \ --temp_location
> gs://\{BUCKET}/tmp/ \ --staging_location gs://\{BUCKET}/binaries/ \
> --worker_harness_container_image=apache-docker-beam-snapshots-docker.bintray.io/beam/go:20180515}}
> ```
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)