hey Ismael,
I appreciate you asking questions to make sure we're doing the right things
to help developers out and making sure it's easy to add more IO ITs.

I strongly agree that we need to make sure we have documentation that
clearly lays out how to get it working.

The setup & teardown scripts for postgres in jdbc's
src/test/resources/kubernetes directory *should* work on a vanilla
kubernetes cluster (it's how I setup them up) - I deliberately did not do
anything fancy when creating my kubernetes cluster. Probably the only thing
that I know of that might be tricky is that the script is currently set up
so that only exposes the postgres service on Node(vm) IPs - that probably
needs documentation on how to use it with the tests. (basically, you should
be able to take the IP address of any of the k8 VMs and use that as the IP
address of postgres - k8 will proxy that over to the correct container.)

I added a few rough notes here in the testing doc:
https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit#heading=h.l06g9u1ejw4l


I'm definitely interested to hear what questions you have.

S

On Wed, Mar 22, 2017 at 3:56 PM Ismaël Mejía <ieme...@apache.org> wrote:

> You have really good points, I agree 100%, docker is easier if it is
> local, once we talk about distributions all of them has their
> pros/cons. I don’t intend to re open the discussion and of course it
> would be silly to go back and remake all the work you already have
> done.
>
> We already agreed on kubernetes and this is it. My point mentioning
> docker-compose was more from the we need to make the life of IT tests
> contributors easier, and maybe adding an extra tool is not the way,
> but at least we will need better documentation or references to help
> developers bootstrap their Kubernetes so they can contribute and
> validate the tests in their own.
>
> On Wed, Mar 22, 2017 at 12:14 AM, Stephen Sisk <s...@google.com.invalid>
> wrote:
> > Hey Ismael,
> >
> > I definitely agree with you that we want something that developers will
> > actually be able to/want to use.
> >
> > in my experience *all* the container orchestration engines are
> non-trivial
> > to set up. When I started examining solutions for beam hosting, I did
> > installs of mesos, kubernetes and docker. Docker is easier in the "run
> only
> > on my local machine" case if devs have it set up, but to do anything
> > interesting (ie, interact with machines that aren't already yours), they
> > all involve work to get them setup on each machine you want to use[4].
> >
> > Kubernetes has some options that make it extremely simple to setup - both
> > AWS[2] and GCE[3] seem to be straightforward to set up for simple dev
> > clusters, with scripts to automate the process (I'm assuming docker has
> > similar setups.)
> >
> > Once kubernetes is set up, it's also a simple yaml file + command to set
> up
> > multiple machines. The kubernetes setup for postgres[5] shows a simple
> one
> > machine example, and the kubernetes setups for HIFIO[6] show
> multi-machine
> > examples.
> >
> > We've spent a lot of time discussing the various options - when we talked
> > about this earlier [1] we decided we would move forward with
> investigating
> > kubernetes, so that's what I used for the IO ITs work I've been doing,
> > which we've now gotten working.
> >
> > Do you feel the advantages of docker are such that we should re-open the
> > discussion and potentially re-do the work we've done so far to get k8
> > working?
> >
> > I took a genuine look at docker earlier in the process and it didn't seem
> > like it was better than the other options in any dimensions (other than
> > "developers usually have it installed already"), and kubernetes/mesos
> > seemed to be more stable/have more of the features discussed in [1].
> > Perhaps that's changed?
> >
> > I think we are just starting to use container orchestration engines, and
> so
> > while I don't want to throw away the work we've done so far, I also don't
> > want to have to do it later if there are reasons we knew about now. :)
> >
> > S
> >
> > [1]
> >
> https://lists.apache.org/thread.html/9fd3c51cb679706efa4d0df2111a6ac438b851818b639aba644607af@%3Cdev.beam.apache.org%3E
> >
> > [2] k8 AWS - https://kubernetes.io/docs/getting-started-guides/aws/
> > [3] k8 GKE - https://cloud.google.com/container-engine/docs/quickstart
> or
> > https://kubernetes.io/docs/getting-started-guides/gce/
> > [4] docker swarm on GCE -
> >
> https://rominirani.com/docker-swarm-on-google-compute-engine-364765b400ed#.gzvruzis9
> >
> > [5] postgres k8 script -
> >
> https://github.com/apache/beam/tree/master/sdks/java/io/jdbc/src/test/resources/kubernetes
> >
> > [6]
> >
> https://github.com/diptikul/incubator-beam/tree/HIFIO-CS-ES/sdks/java/io/hadoop/jdk1.8-tests/src/test/resources/kubernetes
> >
> >
> > On Mon, Mar 20, 2017 at 3:25 PM Ismaël Mejía <ieme...@apache.org> wrote:
> >
> > I have somehow forgotten this one.
> >
> >> Basically - I'm trying to keep number of tools at a minimum while still
> >> providing good support for the functionality we need. Does
> docker-compose
> >> provide something beyond the functionality that k8 does? I'm not
> familiar
> >> with docker-compose, but looking at
> >> https://docs.docker.com/ it doesn't
> >> seem to provide anything that k8 doesn't already.
> >
> > I agree to have the most minimal set of tools, I mentioned
> > docker-compose because I consider also its advantages because its
> > installation is trivial compared to kubernetes (or even minikube for a
> > local install), docker-compose does not have any significant advantage
> > over kubernetes apart of been easier to install/use.
> >
> > But well, better to be consistent and go full with kubernetes, however
> > we need to find a way to help IO authors to bootstrap this, because
> > from my experience creating a cluster with docker-compose is a yaml
> > file + a command, not sure if the basic installation and run of
> > kubernetes is that easy.
> >
> > Ismaël
> >
> > On Wed, Mar 15, 2017 at 8:09 PM, Stephen Sisk <s...@google.com.invalid>
> > wrote:
> >> thanks for the discussion! In general, I agree with the sentiments
> >> expressed here. I updated
> >>
> >
> https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit#heading=h.hlirex1vus1a
> >> to
> >> reflect this discussion. (The plan is still that I will put that on the
> >> website.)
> >>
> >> Apache Docker Repository - are you talking about
> >> https://hub.docker.com/u/apache/ ? If not, can you point me at more
> info?
> > I
> >> can't seem to find info about this on the publicly visible apache-infra
> >> mailing lists thatI could find, and the apache infra website doesn't
> seem
> >> to mention a docker repository.
> >>
> >>
> >>
> >>> However the current Beam Elasticsearch IO does not support
> Elasticsearch
> >> 5, and elastic does not have an image for version 2, so in this
> > particular case
> >> following the priority order we should use the official docker image (2)
> >> for the tests (assuming that both require the same version). Do you
> agree
> >> with this ?
> >>
> >> Yup, that makes sense to me.
> >>
> >>
> >>
> >>> How do we deal with IOs that require more than one base image, this is
> > a  common
> >> scenario for projects that depend on Zookeeper?
> >>
> >> Is there a reason not to just run a kubernetes ReplicaController+Service
> >> for these cases? k8 can easily support having a hostname that pods can
> > rely
> >> on having the zookeeper instance. It also uses text config - see
> >>
> >
> https://github.com/apache/beam/tree/master/sdks/java/io/jdbc/src/test/resources/kubernetes
> > ,
> >> and sets up the connections/nameservice between the hosts - if other
> tests
> >> wanted to rely on postgres, it could just connect to host "postgres" and
> >> postgres is there.
> >>
> >> Basically - I'm trying to keep number of tools at a minimum while still
> >> providing good support for the functionality we need. Does
> docker-compose
> >> provide something beyond the functionality that k8 does? I'm not
> familiar
> >> with docker-compose, but looking at
> >> https://docs.docker.com/compose/overview/#compose-documentation it
> doesn't
> >> seem to provide anything that k8 doesn't already.
> >>
> >>
> >> S
> >>
> >> On Wed, Mar 15, 2017 at 7:10 AM Ismaël Mejía <ieme...@gmail.com> wrote:
> >>
> >> Hi, Thanks for bringing this subject to the mailing list.
> >>
> >> +1
> >> We definitely need a consensus on this, and I agree with your proposal
> and
> >> JB’s comments modulo certain clarifications:
> >>
> >> I think we shall go in this priority order if the version of the image
> we
> >> want is available:
> >>
> >> 1. Image provided by the creator of the data source/sink (if they
> >> officially maintain it). (This is the case of Elasticsearch for example)
> > or
> >> the Apache projects (if they provide one) as JB mentions.
> >> 2. Official docker images (because they have security fixes and have
> >> guaranteed maintenance.
> >> 3. Non-official docker images or images from other providers that have
> > good
> >> maintainers e.g. quay.io
> >>
> >> It makes sense to use the same image for all the tests. and to use the
> >> fixed versions supported by the respective IO to avoid possible issues
> >> during testing between different versions/naming of env variables, etc.
> >>
> >> The Elasticsearch case is a 'good' example because it shows all the
> > current
> >> issues:
> >>
> >> We should not use one elasticsearch image (elk) for some tests and a
> >> different one for the other (the quay one), and if we resolve by
> priority
> >> we would take the image provided by the creator (1) for both cases.
> >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
> >> However the current Beam Elasticsearch IO does not support Elasticsearch
> > 5,
> >> and elastic does not have an image for version 2, so in this particular
> >> case following the priority order we should use the official docker
> image
> >> (2) for the tests (assuming that both require the same version).
> >> Do you agree with this ?
> >>
> >>
> >> Thinking about the ELK image I came with a new question. How do we deal
> >> with IOs that require more than one base image, this is a common
> scenario
> >> for projects that depend on Zookeeper? e.g. Kafka/Solr.  Usually people
> >> coordinate those with a docker-compose file that creates an artificial
> >> network to connect the Zookeeper image and the Kafka/Solr one
> >> just executing the 'docker-compose up' command
> >> . Will we adopt this for such cases ?
> >>
> >> I know that Kubernetes does this too, but the docker-compose format is
> >> quite easy and textual,
> >> and it is usually ready with the docker installation, additionally the
> >> docker-compose files can easily be translated with kompose into
> Kubernetes
> >> resources.
> >>
> >> Ismaël
> >>
> >> On Wed, Mar 15, 2017 at 3:17 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> >> wrote:
> >>
> >>> Hi Stephen,
> >>>
> >>> 1. About the docker repositories, we now have official Docker repo at
> >>> Apache. So, for the Apache projects, I would recommend the Apache
> > official
> >>> repo. Anyway, generally speaking, I would recommend the official repo
> >> (from
> >>> the projects).
> >>>
> >>> 2. To avoid "unpredictable" breaking change, I would pin to a
> particular
> >>> versions, and explicitly update if needed.
> >>>
> >>> 3. It's better that docker images are under an unique responsibility
> > scope
> >>> as different IOs can use the same resources, so they should use the
> same
> >>> provided docker.
> >>>
> >>> By the way, I also have a docker coming for RedisIO ;)
> >>>
> >>> Regards
> >>> JB
> >>>
> >>>
> >>> On 03/15/2017 08:01 AM, Stephen Sisk wrote:
> >>>
> >>>> hi!
> >>>>
> >>>> as part of doing the work to enable IO ITs, we decided we want to use
> >>>> docker. As part of that, we need to run docker images and they'll
> >> probably
> >>>> be pulled from a docker repository.
> >>>>
> >>>> Questions:
> >>>> * What docker repositories (and users on docker hub) do we as a group
> >>>> allow
> >>>> for images we'll run for hosted data stores?
> >>>>  -> My proposal is we should only use repositories/images that are
> >>>> regularly updated and that have someone saying that the images we
> depend
> >>>> on
> >>>> are secure. In the set of images currently linked to by checked in
> >> code/in
> >>>> PR code, quay.io and official docker images seem fine. They both have
> >>>> security scans (for what that's worth) and generally seem okay.
> >>>>
> >>>> * Do we pin to particular docker images or allow our version to float?
> >>>>  -> I have seen docker images change in insecure way (e.g. switching
> the
> >>>> name of the password parameter, meaning that the data store was secure
> >>>> when
> >>>> set up, and became insecure because no password was set after the
> image
> >>>> update), so I'd prefer to pin to particular versions, and update on a
> >>>> periodic basis.
> >>>>
> >>>> I'm relatively new to docker best practices, so I'm open to
> suggestions
> >> on
> >>>> this.
> >>>>
> >>>> Current ITs with docker images:
> >>>> * Jdbc - https://hub.docker.com/_/postgres/  (official image)
> >>>> * Elasticsearch - https://hub.docker.com/r/sebp/elk/ (semi-official
> >>>> looking
> >>>> image)
> >>>> * (PR in-flight
> >>>> <https://github.com/apache/beam/pull/2193/files#diff-a630b5f
> >>>> ff9aebc9e99a3f324c9cf75a9R52>)
> >>>> HadoopInputFormat's elasticsearch and cassandra tests -
> >>>> https://hub.docker.com/_/cassandra/ and
> >>>> https://quay.io/repository/pires/docker-elasticsearch-kubern
> >>>> etes?tag=5.2.2&tab=tags
> >>>> (official image, and image from quay.io, which provides security
> audits
> >>>> of
> >>>> their images)
> >>>>
> >>>> The more I think about it, the less I'm excited about the sebp/elk
> image
> >> -
> >>>> I'm sure it's fine, but I'd prefer using images from a source that we
> >> know
> >>>> is trying to check for security problems.
> >>>>
> >>>> There's a secondary problem that we're using two different
> elasticsearch
> >>>> images - I'd like to use only one image. I'll follow up on that -
> >>>> https://issues.apache.org/jira/browse/BEAM-1644
> >>>>
> >>>> S
> >>>>
> >>>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
>

Reply via email to