Probably this will be a common question from IO transform authors as Beam matures. Probably we should add a section on this to IO authoring guide [1][2] ?
Thanks, Cham [1] https://beam.apache.org/documentation/io/authoring-overview/ [2] https://issues.apache.org/jira/browse/BEAM-1025 On Fri, Jun 23, 2017 at 2:57 AM Jean-Baptiste Onofré <[email protected]> wrote: > Hi, > > It's something we already discussed in the past (for Kafka by instance). > > For Kafka, we were able to use a single IO with spring-el to detect the > version. > That's certainly the preferred approach, but it would not be possible in > all cases. > > I would suggest, if first approach doesn't work: > > * In term of Maven modules: > > - sdk/java/io/elasticsearch/common that could contain shared code + itests > - sdk/java/io/elasticsearch/2.x (artifactId elasticsearch-2.x), specific > code + > utests > - sdk/java/io/elasticsearch/5.x (artifactId elasticsearch-5.x), specific > code + > utests > > Regards > JB > > On 06/23/2017 11:51 AM, Etienne Chauchot wrote: > > Hi guys, > > > > I'm working on Elasticsearch 5.x support for Beam IO (it only supports > > Elasticsearch 2.x right now). I wanted to have your opinion on some > points > > related to maintenance. > > > > In this ES case a big part of the code of the IO is common between ES > v2.x and > > ES v5.x. Still, there are some differences: > > > > - initialization of UT (change in embedded test framework) > > > > - Minor differences in one message format > > > > - New feature that will allow improving the split or new feature that is > worth > > leveraging (ES pipelines) > > > > > > => Question is: what do you think is the best way to architecture the IO > to > > reduce maintenance > > > > > > 1. We could have an elasticsearchio-common package and two packages that > are > > specific to each version of the backend. But I find it confusing for the > users > > to have separate packages and more complex to maintain for us. > > > > 2. I'm more in favor of detecting the version at IO initialization time > and > > then, in the parts that are different do a simple if (version == x). But > it will > > make code paths more complex. Note that for example project es-hadoop (ES > > connectors for big data engines) chose this way. > > > > > > Another thing related to unit tests: in fact they are more close to > integration > > tests as they use an embedded backend server. I did it that way because > I wanted > > to unit test things like split that require a real instance. > > => What is the recommended way of testing on both supported versions > knowing > > that both the test code and the test dependencies are different? > > > > For integration tests (they are mainly used as load testing), the test > code and > > the test dependencies are the same between versions because there is no > embedded > > ES. So, it will be only needed to run them twice against 2 versions of > the backend. > > > > > > What do you think? > > > > PS: sorry for the long email :) > > > > Best! > > Etienne > > > > > > > > > > > > > > > > > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
