I have somehow forgotten this one.

> Basically - I'm trying to keep number of tools at a minimum while still
> providing good support for the functionality we need. Does docker-compose
> provide something beyond the functionality that k8 does? I'm not familiar
> with docker-compose, but looking at
> https://docs.docker.com/ it doesn't
> seem to provide anything that k8 doesn't already.

I agree to have the most minimal set of tools, I mentioned
docker-compose because I consider also its advantages because its
installation is trivial compared to kubernetes (or even minikube for a
local install), docker-compose does not have any significant advantage
over kubernetes apart of been easier to install/use.

But well, better to be consistent and go full with kubernetes, however
we need to find a way to help IO authors to bootstrap this, because
from my experience creating a cluster with docker-compose is a yaml
file + a command, not sure if the basic installation and run of
kubernetes is that easy.

Ismaël

On Wed, Mar 15, 2017 at 8:09 PM, Stephen Sisk <s...@google.com.invalid> wrote:
> thanks for the discussion! In general, I agree with the sentiments
> expressed here. I updated
> https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit#heading=h.hlirex1vus1a
> to
> reflect this discussion. (The plan is still that I will put that on the
> website.)
>
> Apache Docker Repository - are you talking about
> https://hub.docker.com/u/apache/ ? If not, can you point me at more info? I
> can't seem to find info about this on the publicly visible apache-infra
> mailing lists thatI could find, and the apache infra website doesn't seem
> to mention a docker repository.
>
>
>
>> However the current Beam Elasticsearch IO does not support Elasticsearch
> 5, and elastic does not have an image for version 2, so in this particular 
> case
> following the priority order we should use the official docker image (2)
> for the tests (assuming that both require the same version). Do you agree
> with this ?
>
> Yup, that makes sense to me.
>
>
>
>> How do we deal with IOs that require more than one base image, this is a  
>> common
> scenario for projects that depend on Zookeeper?
>
> Is there a reason not to just run a kubernetes ReplicaController+Service
> for these cases? k8 can easily support having a hostname that pods can rely
> on having the zookeeper instance. It also uses text config - see
> https://github.com/apache/beam/tree/master/sdks/java/io/jdbc/src/test/resources/kubernetes,
> and sets up the connections/nameservice between the hosts - if other tests
> wanted to rely on postgres, it could just connect to host "postgres" and
> postgres is there.
>
> Basically - I'm trying to keep number of tools at a minimum while still
> providing good support for the functionality we need. Does docker-compose
> provide something beyond the functionality that k8 does? I'm not familiar
> with docker-compose, but looking at
> https://docs.docker.com/compose/overview/#compose-documentation it doesn't
> seem to provide anything that k8 doesn't already.
>
>
> S
>
> On Wed, Mar 15, 2017 at 7:10 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>
> Hi, Thanks for bringing this subject to the mailing list.
>
> +1
> We definitely need a consensus on this, and I agree with your proposal and
> JB’s comments modulo certain clarifications:
>
> I think we shall go in this priority order if the version of the image we
> want is available:
>
> 1. Image provided by the creator of the data source/sink (if they
> officially maintain it). (This is the case of Elasticsearch for example) or
> the Apache projects (if they provide one) as JB mentions.
> 2. Official docker images (because they have security fixes and have
> guaranteed maintenance.
> 3. Non-official docker images or images from other providers that have good
> maintainers e.g. quay.io
>
> It makes sense to use the same image for all the tests. and to use the
> fixed versions supported by the respective IO to avoid possible issues
> during testing between different versions/naming of env variables, etc.
>
> The Elasticsearch case is a 'good' example because it shows all the current
> issues:
>
> We should not use one elasticsearch image (elk) for some tests and a
> different one for the other (the quay one), and if we resolve by priority
> we would take the image provided by the creator (1) for both cases.
> https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
> However the current Beam Elasticsearch IO does not support Elasticsearch 5,
> and elastic does not have an image for version 2, so in this particular
> case following the priority order we should use the official docker image
> (2) for the tests (assuming that both require the same version).
> Do you agree with this ?
>
>
> Thinking about the ELK image I came with a new question. How do we deal
> with IOs that require more than one base image, this is a common scenario
> for projects that depend on Zookeeper? e.g. Kafka/Solr.  Usually people
> coordinate those with a docker-compose file that creates an artificial
> network to connect the Zookeeper image and the Kafka/Solr one
> just executing the 'docker-compose up' command
> . Will we adopt this for such cases ?
>
> I know that Kubernetes does this too, but the docker-compose format is
> quite easy and textual,
> and it is usually ready with the docker installation, additionally the
> docker-compose files can easily be translated with kompose into Kubernetes
> resources.
>
> Ismaël
>
> On Wed, Mar 15, 2017 at 3:17 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi Stephen,
>>
>> 1. About the docker repositories, we now have official Docker repo at
>> Apache. So, for the Apache projects, I would recommend the Apache official
>> repo. Anyway, generally speaking, I would recommend the official repo
> (from
>> the projects).
>>
>> 2. To avoid "unpredictable" breaking change, I would pin to a particular
>> versions, and explicitly update if needed.
>>
>> 3. It's better that docker images are under an unique responsibility scope
>> as different IOs can use the same resources, so they should use the same
>> provided docker.
>>
>> By the way, I also have a docker coming for RedisIO ;)
>>
>> Regards
>> JB
>>
>>
>> On 03/15/2017 08:01 AM, Stephen Sisk wrote:
>>
>>> hi!
>>>
>>> as part of doing the work to enable IO ITs, we decided we want to use
>>> docker. As part of that, we need to run docker images and they'll
> probably
>>> be pulled from a docker repository.
>>>
>>> Questions:
>>> * What docker repositories (and users on docker hub) do we as a group
>>> allow
>>> for images we'll run for hosted data stores?
>>>  -> My proposal is we should only use repositories/images that are
>>> regularly updated and that have someone saying that the images we depend
>>> on
>>> are secure. In the set of images currently linked to by checked in
> code/in
>>> PR code, quay.io and official docker images seem fine. They both have
>>> security scans (for what that's worth) and generally seem okay.
>>>
>>> * Do we pin to particular docker images or allow our version to float?
>>>  -> I have seen docker images change in insecure way (e.g. switching the
>>> name of the password parameter, meaning that the data store was secure
>>> when
>>> set up, and became insecure because no password was set after the image
>>> update), so I'd prefer to pin to particular versions, and update on a
>>> periodic basis.
>>>
>>> I'm relatively new to docker best practices, so I'm open to suggestions
> on
>>> this.
>>>
>>> Current ITs with docker images:
>>> * Jdbc - https://hub.docker.com/_/postgres/  (official image)
>>> * Elasticsearch - https://hub.docker.com/r/sebp/elk/ (semi-official
>>> looking
>>> image)
>>> * (PR in-flight
>>> <https://github.com/apache/beam/pull/2193/files#diff-a630b5f
>>> ff9aebc9e99a3f324c9cf75a9R52>)
>>> HadoopInputFormat's elasticsearch and cassandra tests -
>>> https://hub.docker.com/_/cassandra/ and
>>> https://quay.io/repository/pires/docker-elasticsearch-kubern
>>> etes?tag=5.2.2&tab=tags
>>> (official image, and image from quay.io, which provides security audits
>>> of
>>> their images)
>>>
>>> The more I think about it, the less I'm excited about the sebp/elk image
> -
>>> I'm sure it's fine, but I'd prefer using images from a source that we
> know
>>> is trying to check for security problems.
>>>
>>> There's a secondary problem that we're using two different elasticsearch
>>> images - I'd like to use only one image. I'll follow up on that -
>>> https://issues.apache.org/jira/browse/BEAM-1644
>>>
>>> S
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>

Reply via email to