I have somehow forgotten this one. > Basically - I'm trying to keep number of tools at a minimum while still > providing good support for the functionality we need. Does docker-compose > provide something beyond the functionality that k8 does? I'm not familiar > with docker-compose, but looking at > https://docs.docker.com/ it doesn't > seem to provide anything that k8 doesn't already.
I agree to have the most minimal set of tools, I mentioned docker-compose because I consider also its advantages because its installation is trivial compared to kubernetes (or even minikube for a local install), docker-compose does not have any significant advantage over kubernetes apart of been easier to install/use. But well, better to be consistent and go full with kubernetes, however we need to find a way to help IO authors to bootstrap this, because from my experience creating a cluster with docker-compose is a yaml file + a command, not sure if the basic installation and run of kubernetes is that easy. Ismaël On Wed, Mar 15, 2017 at 8:09 PM, Stephen Sisk <s...@google.com.invalid> wrote: > thanks for the discussion! In general, I agree with the sentiments > expressed here. I updated > https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit#heading=h.hlirex1vus1a > to > reflect this discussion. (The plan is still that I will put that on the > website.) > > Apache Docker Repository - are you talking about > https://hub.docker.com/u/apache/ ? If not, can you point me at more info? I > can't seem to find info about this on the publicly visible apache-infra > mailing lists thatI could find, and the apache infra website doesn't seem > to mention a docker repository. > > > >> However the current Beam Elasticsearch IO does not support Elasticsearch > 5, and elastic does not have an image for version 2, so in this particular > case > following the priority order we should use the official docker image (2) > for the tests (assuming that both require the same version). Do you agree > with this ? > > Yup, that makes sense to me. > > > >> How do we deal with IOs that require more than one base image, this is a >> common > scenario for projects that depend on Zookeeper? > > Is there a reason not to just run a kubernetes ReplicaController+Service > for these cases? k8 can easily support having a hostname that pods can rely > on having the zookeeper instance. It also uses text config - see > https://github.com/apache/beam/tree/master/sdks/java/io/jdbc/src/test/resources/kubernetes, > and sets up the connections/nameservice between the hosts - if other tests > wanted to rely on postgres, it could just connect to host "postgres" and > postgres is there. > > Basically - I'm trying to keep number of tools at a minimum while still > providing good support for the functionality we need. Does docker-compose > provide something beyond the functionality that k8 does? I'm not familiar > with docker-compose, but looking at > https://docs.docker.com/compose/overview/#compose-documentation it doesn't > seem to provide anything that k8 doesn't already. > > > S > > On Wed, Mar 15, 2017 at 7:10 AM Ismaël Mejía <ieme...@gmail.com> wrote: > > Hi, Thanks for bringing this subject to the mailing list. > > +1 > We definitely need a consensus on this, and I agree with your proposal and > JB’s comments modulo certain clarifications: > > I think we shall go in this priority order if the version of the image we > want is available: > > 1. Image provided by the creator of the data source/sink (if they > officially maintain it). (This is the case of Elasticsearch for example) or > the Apache projects (if they provide one) as JB mentions. > 2. Official docker images (because they have security fixes and have > guaranteed maintenance. > 3. Non-official docker images or images from other providers that have good > maintainers e.g. quay.io > > It makes sense to use the same image for all the tests. and to use the > fixed versions supported by the respective IO to avoid possible issues > during testing between different versions/naming of env variables, etc. > > The Elasticsearch case is a 'good' example because it shows all the current > issues: > > We should not use one elasticsearch image (elk) for some tests and a > different one for the other (the quay one), and if we resolve by priority > we would take the image provided by the creator (1) for both cases. > https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html > However the current Beam Elasticsearch IO does not support Elasticsearch 5, > and elastic does not have an image for version 2, so in this particular > case following the priority order we should use the official docker image > (2) for the tests (assuming that both require the same version). > Do you agree with this ? > > > Thinking about the ELK image I came with a new question. How do we deal > with IOs that require more than one base image, this is a common scenario > for projects that depend on Zookeeper? e.g. Kafka/Solr. Usually people > coordinate those with a docker-compose file that creates an artificial > network to connect the Zookeeper image and the Kafka/Solr one > just executing the 'docker-compose up' command > . Will we adopt this for such cases ? > > I know that Kubernetes does this too, but the docker-compose format is > quite easy and textual, > and it is usually ready with the docker installation, additionally the > docker-compose files can easily be translated with kompose into Kubernetes > resources. > > Ismaël > > On Wed, Mar 15, 2017 at 3:17 AM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi Stephen, >> >> 1. About the docker repositories, we now have official Docker repo at >> Apache. So, for the Apache projects, I would recommend the Apache official >> repo. Anyway, generally speaking, I would recommend the official repo > (from >> the projects). >> >> 2. To avoid "unpredictable" breaking change, I would pin to a particular >> versions, and explicitly update if needed. >> >> 3. It's better that docker images are under an unique responsibility scope >> as different IOs can use the same resources, so they should use the same >> provided docker. >> >> By the way, I also have a docker coming for RedisIO ;) >> >> Regards >> JB >> >> >> On 03/15/2017 08:01 AM, Stephen Sisk wrote: >> >>> hi! >>> >>> as part of doing the work to enable IO ITs, we decided we want to use >>> docker. As part of that, we need to run docker images and they'll > probably >>> be pulled from a docker repository. >>> >>> Questions: >>> * What docker repositories (and users on docker hub) do we as a group >>> allow >>> for images we'll run for hosted data stores? >>> -> My proposal is we should only use repositories/images that are >>> regularly updated and that have someone saying that the images we depend >>> on >>> are secure. In the set of images currently linked to by checked in > code/in >>> PR code, quay.io and official docker images seem fine. They both have >>> security scans (for what that's worth) and generally seem okay. >>> >>> * Do we pin to particular docker images or allow our version to float? >>> -> I have seen docker images change in insecure way (e.g. switching the >>> name of the password parameter, meaning that the data store was secure >>> when >>> set up, and became insecure because no password was set after the image >>> update), so I'd prefer to pin to particular versions, and update on a >>> periodic basis. >>> >>> I'm relatively new to docker best practices, so I'm open to suggestions > on >>> this. >>> >>> Current ITs with docker images: >>> * Jdbc - https://hub.docker.com/_/postgres/ (official image) >>> * Elasticsearch - https://hub.docker.com/r/sebp/elk/ (semi-official >>> looking >>> image) >>> * (PR in-flight >>> <https://github.com/apache/beam/pull/2193/files#diff-a630b5f >>> ff9aebc9e99a3f324c9cf75a9R52>) >>> HadoopInputFormat's elasticsearch and cassandra tests - >>> https://hub.docker.com/_/cassandra/ and >>> https://quay.io/repository/pires/docker-elasticsearch-kubern >>> etes?tag=5.2.2&tab=tags >>> (official image, and image from quay.io, which provides security audits >>> of >>> their images) >>> >>> The more I think about it, the less I'm excited about the sebp/elk image > - >>> I'm sure it's fine, but I'd prefer using images from a source that we > know >>> is trying to check for security problems. >>> >>> There's a secondary problem that we're using two different elasticsearch >>> images - I'd like to use only one image. I'll follow up on that - >>> https://issues.apache.org/jira/browse/BEAM-1644 >>> >>> S >>> >>> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >>