The Java deps are only half of the problem. The other half is that PubsubIO
and KafkaIO are using classes that do not have a python equivalent and thus
no universal coder.  The solution discussed in the issue I linked above was
to use row coder registries in Java, to convert from these types to rows /
schemas.

Any thoughts on that?

-chad


On Wed, Feb 19, 2020 at 6:00 PM Robert Bradshaw <rober...@google.com> wrote:

> Hopefully this should be resovled by
> https://issues.apache.org/jira/browse/BEAM-9229
>
> On Wed, Feb 19, 2020 at 5:52 PM Chad Dombrova <chad...@gmail.com> wrote:
> >
> > We are using external transforms to get access to PubSubIO within
> python.  It works well, but there is one major issue remaining to fix:  we
> have to build a custom beam with a hack to add the PubSubIO java deps and
> fix up the coders.  This affects KafkaIO as well.  There's an issue here:
> https://issues.apache.org/jira/browse/BEAM-7870
> >
> > I consider this to be the most pressing problem with external transforms
> right now.
> >
> > -chad
> >
> >
> >
> > On Wed, Feb 12, 2020 at 9:28 AM Chamikara Jayalath <chamik...@google.com>
> wrote:
> >>
> >>
> >>
> >> On Wed, Feb 12, 2020 at 8:10 AM Alexey Romanenko <
> aromanenko....@gmail.com> wrote:
> >>>
> >>>
> >>>> AFAIK, there's no official guide for cross-language pipelines. But
> there are examples and test cases you can use as reference such as:
> >>>>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount_xlang.py
> >>>>
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIOExternalTest.java
> >>>>
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ValidateRunnerXlangTest.java
> >>>>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service_test.py
> >>>
> >>>
> >>> I'm trying to work with tech writers to add more documentation related
> to cross-language (in a few months). But any help related to documenting
> what we have now is greatly appreciated.
> >>>
> >>>
> >>> That would be great since now the information is a bit scattered over
> different places. I’d be happy to help with any examples and their testing
> that I hope I’ll have after a while.
> >>
> >>
> >> Great.
> >>
> >>>
> >>>
> >>>> The runner and SDK supports are in working state I could say but not
> many IOs expose their cross-language interface yet (you can easily write
> cross-language configuration for any Python transforms by yourself though).
> >>>
> >>>
> >>> Should mention here the test suites for portable Flink and Spark
> Heejong added recently :)
> >>>
> >>>
> https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_XVR_Flink/
> >>>
> https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_XVR_Spark/
> >>>
> >>>
> >>> Nice! Looks like my question above about cross-language support in
> Spark runner was redundant.
> >>>
> >>>
> >>>>
> >>>>
> >>>>>
> >>>>> - Is the information here
> https://beam.apache.org/roadmap/connectors-multi-sdk/ up-to-date? Are
> there any other entry points you can recommend?
> >>>>
> >>>>
> >>>> I think it's up-to-date.
> >>>
> >>>
> >>> Mostly up to date.  Testing status is more complete now and we are
> actively working on getting the dependences story correct and adding
> support for DataflowRunner.
> >>>
> >>>
> >>> Are there any “umbrella" Jiras regarding cross-language support that I
> can track?
> >>
> >>
> >> I don't think we have an umbrella JIRA currently. I can create one and
> mention it in the roadmap.
>

Reply via email to