ASF GitHub Bot commented on BEAM-1637:

GitHub user echauchot opened a pull request:


    [BEAM-1637] Create Elasticsearch IO compatible with ES 5.x

    Follow this checklist to help us incorporate your contribution quickly and 
     - [X] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
     - [X] Each commit in the pull request should have a meaningful subject 
line and body.
     - [X] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
     - [X] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
     - [X] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
     - [X] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
    R: @jkff 
    CC: @jbonofre 
    Some comments about this pull request:
    1. As discussed in the ML, the architecture is with a common module and 
modules per version (which differ in features but also in UTests). Modules per 
version use the same package name for backward compatibility (exactly same 
pipeline code). Classes in common package shall not be used directly by users. 
In a previous design, the common module had the same java package name than 
version modules to allow putting common classes package private. But I 
abandoned this design because of javadoc generation problems (no public classes 
in common module and no package exclusion possible otherwise no ES javadoc at 
all). So in the end, I just put common classes in a common package with public 
visibility and a javadoc warning stating that they shall not be used by 
pipeline authors. If you have a better suggestion, I'm all ear :)
    2. I could not use inheritance because of statics so I used composition. If 
you have a better design, feel free to comment.
    3. There is a very hacky thing in the JarHell class. The problem was that 
surefire dependencies entailed a duplicate class in the classpath which caused 
the jarHell detection to fail the build. Please read the javadoc of this class. 
If you have any other suggestion to avoid jarHell problem, feel free to comment.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/echauchot/beam BEAM-1637-ELASTICSEARCH-5

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3703
commit e623e8a529ab687b267685ef040e6059f61caa08
Author: Etienne Chauchot <echauc...@gmail.com>
Date:   2017-06-26T08:58:21Z

    [BEAM-1637] Create Elasticsearch IO compatible with ES 5.x


> Create Elasticsearch IO compatible with ES 5.x
> ----------------------------------------------
>                 Key: BEAM-1637
>                 URL: https://issues.apache.org/jira/browse/BEAM-1637
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Minor
> The current Elasticsearch IO (see 
> https://issues.apache.org/jira/browse/BEAM-425) is only compatible with 
> Elasticsearch v 2.x. The aim is to have an IO compatible with ES v 5.x. 
> Beyond being able to address v5.x elasticsearch instances, we could also 
> leverage the use of the Elasticsearch pipeline API and also better split the 
> dataset (be as close as possible of desiredBundleSize) thanks to the new ES 
> split API that allows ES shards splitting.

This message was sent by Atlassian JIRA

Reply via email to