Probably this will be a common question from IO transform authors as Beam
matures. Probably we should add a section on this to IO authoring guide
[1][2] ?

Thanks,
Cham

[1] https://beam.apache.org/documentation/io/authoring-overview/
[2] https://issues.apache.org/jira/browse/BEAM-1025

On Fri, Jun 23, 2017 at 2:57 AM Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi,
>
> It's something we already discussed in the past (for Kafka by instance).
>
> For Kafka, we were able to use a single IO with spring-el to detect the
> version.
> That's certainly the preferred approach, but it would not be possible in
> all cases.
>
> I would suggest, if first approach doesn't work:
>
> * In term of Maven modules:
>
> - sdk/java/io/elasticsearch/common that could contain shared code + itests
> - sdk/java/io/elasticsearch/2.x (artifactId elasticsearch-2.x), specific
> code +
> utests
> - sdk/java/io/elasticsearch/5.x (artifactId elasticsearch-5.x), specific
> code +
> utests
>
> Regards
> JB
>
> On 06/23/2017 11:51 AM, Etienne Chauchot wrote:
> > Hi guys,
> >
> > I'm working on Elasticsearch 5.x support for Beam IO (it only supports
> > Elasticsearch 2.x right now). I wanted to have your opinion on some
> points
> > related to maintenance.
> >
> > In this ES case a big part of the code of the IO is common between ES
> v2.x and
> > ES v5.x. Still, there are some differences:
> >
> > - initialization of UT (change in embedded test framework)
> >
> > - Minor differences in one message format
> >
> > - New feature that will allow improving the split or new feature that is
> worth
> > leveraging (ES pipelines)
> >
> >
> > => Question is: what do you think is the best way to architecture the IO
> to
> > reduce maintenance
> >
> >
> > 1. We could have an elasticsearchio-common package and two packages that
> are
> > specific to each version of the backend. But I find it confusing for the
> users
> > to have separate packages and more complex to maintain for us.
> >
> > 2. I'm more in favor of detecting the version at IO initialization time
> and
> > then, in the parts that are different do a simple if (version == x). But
> it will
> > make code paths more complex. Note that for example project es-hadoop (ES
> > connectors for big data engines) chose this way.
> >
> >
> > Another thing related to unit tests: in fact they are more close to
> integration
> > tests as they use an embedded backend server. I did it that way because
> I wanted
> > to unit test things like split that require a real instance.
> > => What is the recommended way of testing on both supported versions
> knowing
> > that both the test code and the test dependencies are different?
> >
> > For integration tests (they are mainly used as load testing), the test
> code and
> > the test dependencies are the same between versions because there is no
> embedded
> > ES. So, it will be only needed to run them twice against 2 versions of
> the backend.
> >
> >
> > What do you think?
> >
> > PS: sorry for the long email :)
> >
> > Best!
> > Etienne
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to