Thanks for sharing your ideas for load testing!
According to other contributors knowledge/experience: I noticed that streaming with KafkaIO is currently supported by wrapping the ExternalTransform in Python SDK. Do you think that streaming pipelines will "just work" with the current state of portability if I do the same for UnboundedSyntheticSource or is there something else missing?
Basically yes, but it requires a bit more effort than just wrapping
about ExternalTransform. You need to provide an ExternalTransformBuilder
for the transform to be configured externally.
In portability UnboundedSources can only be supported via SDF. To still
be able to use legacy IO which uses UnboundedSource the Runner has to
supply this capability (which the Flink Runner supports). This will
likely go away if we have an UnboundedSource SDF Wrapper :)
As for the stability of the expansion protocol, I think it's relatively
stable and won't changed fundamentally.
Cheers,
Max
On 09.05.19 16:21, Łukasz Gajowy wrote:
My recommendation is that we make sure the protocol is stable and
implemented on the Python SDK side before starting the Go SDK side,
since that work is already in progress.
+1 This is exactly the roadmap that I had in mind - start with
externalizing and using the synthetic sources in Python SDK and then
proceed with Go. Still worth knowing what's going on there so that's why
I asked. :)
Thanks,
Łukasz
czw., 9 maj 2019 o 16:03 Robert Burke <rob...@frantil.com
<mailto:rob...@frantil.com>> napisał(a):
Currently the Go SDK doesn't have cross Language support
implemented. My recommendation is that we make sure the protocol is
stable and implemented on the Python SDK side before starting the Go
SDK side, since that work is already in progress.
The relevant state of the Go SDK:
* beam.External exists, for specifying go transforms. (See the Go
SDK's PubSubIO for an example)
* the generated go code for the protos, including the Expansions
service API was refreshed last week. Meaning the work isn't blocked
on that.
In principle, the work would be to
* ensure that the SDK side of job submission connects and looks up
relevant transforms against the Expansion service if needed, and
does the appropriate pipeline graph surgery.
*This may be something that's handled as some kind of hook and
registration process for generally handling external transforms SDK
side.
* Have some kind of external transform to specify and configure on
the Go side.
Most of this can be ported from the Python implementation once it's
stabilized.
As with all my recommendations on how to do things with the Go SDK,
feel free to ignore it and forge ahead. I look forward to someone
tackling this, whenever it happens!
Your friendly neighborhood distributed gopher wrangler,
Robert Burke
Related:
PR 8531 [1] begins adding automates testing of the Go SDK against
Flink, which should assist with ensuring this eventual work keeps
working.
[1]: https://github.com/apache/beam/pull/8531
On Thu, May 9, 2019, 6:32 AM Łukasz Gajowy <lgaj...@apache.org
<mailto:lgaj...@apache.org>> wrote:
Hi,
part of our work that needs to be done to create tests for Core
Apache Beam operations is to enable both batch and streaming
testing scenarios in all SDKs (not only Java, so lot's of
portability usage here). I gathered some thoughts on how (I
think) this could be done at the current state of Beam:
https://s.apache.org/portable-load-tests
I added some questions at the end of the doc but I will paste
them here too for visibility:
* What is the status of Cross Language Support for Go SDK? Is
it non-trivial to enable such support (if it doesn't exist yet)?
* According to other contributors knowledge/experience: I
noticed that streaming with KafkaIO is currently supported
by wrapping the ExternalTransform in Python SDK. Do you
think that streaming pipelines will "just work" with the
current state of portability if I do the same for
UnboundedSyntheticSource or is there something else missing?
BTW: great to see Cross Language Support happening. Thanks for
doing this! :)
Thanks,
Łukasz