Thanks for sharing your ideas for load testing!

According to other contributors knowledge/experience: I noticed that streaming with KafkaIO is currently supported by wrapping the ExternalTransform in Python SDK. Do you think that streaming pipelines will "just work" with the current state of portability if I do the same for UnboundedSyntheticSource or is there something else missing?

Basically yes, but it requires a bit more effort than just wrapping about ExternalTransform. You need to provide an ExternalTransformBuilder for the transform to be configured externally.

In portability UnboundedSources can only be supported via SDF. To still be able to use legacy IO which uses UnboundedSource the Runner has to supply this capability (which the Flink Runner supports). This will likely go away if we have an UnboundedSource SDF Wrapper :)

As for the stability of the expansion protocol, I think it's relatively stable and won't changed fundamentally.

Cheers,
Max

On 09.05.19 16:21, Łukasz Gajowy wrote:
My recommendation is that we make sure the protocol is stable and implemented on the Python SDK side before starting the Go SDK side, since that work is already in progress.

+1 This is exactly the roadmap that I had in mind - start with externalizing and using the synthetic sources in Python SDK and then proceed with Go. Still worth knowing what's going on there so that's why I asked. :)

Thanks,
Łukasz

czw., 9 maj 2019 o 16:03 Robert Burke <rob...@frantil.com <mailto:rob...@frantil.com>> napisał(a):

    Currently the Go SDK doesn't have cross Language support
    implemented. My recommendation is that we make sure the protocol is
    stable and implemented on the Python SDK side before starting the Go
    SDK side, since that work is already in progress.

      The relevant state of the Go SDK:
    * beam.External exists, for specifying go transforms. (See the Go
    SDK's PubSubIO for an example)
    * the generated go code for the protos, including the Expansions
    service API was refreshed last week. Meaning the work isn't blocked
    on that.

    In principle, the work would be to
    * ensure that the SDK side of job submission connects and looks up
    relevant transforms against the Expansion service if needed, and
    does the appropriate pipeline graph surgery.
       *This may be something that's handled as some kind of hook and
    registration process for generally handling external transforms SDK
    side.
    * Have some kind of external transform to specify and configure on
    the Go side.

    Most of this can be ported from the Python implementation once it's
    stabilized.

    As with all my recommendations on how to do things with the Go SDK,
    feel free to ignore it and forge ahead. I look forward to someone
    tackling this, whenever it happens!

    Your friendly neighborhood distributed gopher wrangler,
    Robert Burke

    Related:
    PR 8531 [1] begins adding automates testing of the Go SDK against
    Flink, which should assist with ensuring this eventual work keeps
    working.

    [1]: https://github.com/apache/beam/pull/8531

    On Thu, May 9, 2019, 6:32 AM Łukasz Gajowy <lgaj...@apache.org
    <mailto:lgaj...@apache.org>> wrote:

        Hi,

        part of our work that needs to be done to create tests for Core
        Apache Beam operations is to enable both batch and streaming
        testing scenarios in all SDKs (not only Java, so lot's of
        portability usage here). I gathered some thoughts on how (I
        think) this could be done at the current state of Beam:

        https://s.apache.org/portable-load-tests

        I added some questions at the end of the doc but I will paste
        them here too for visibility:

          * What is the status of Cross Language Support for Go SDK? Is
            it non-trivial to enable such support (if it doesn't exist yet)?
          * According to other contributors knowledge/experience: I
            noticed that streaming with KafkaIO is currently supported
            by wrapping the ExternalTransform in Python SDK. Do you
            think that streaming pipelines will "just work" with the
            current state of portability if I do the same for
UnboundedSyntheticSource or is there something else missing?
        BTW: great to see Cross Language Support happening. Thanks for
        doing this! :)

        Thanks,
        Łukasz




Reply via email to