thanks for the discussion! In general, I agree with the sentiments
expressed here. I updated
https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit#heading=h.hlirex1vus1a
to
reflect this discussion. (The plan is still that I will put that on the
website.)

Apache Docker Repository - are you talking about
https://hub.docker.com/u/apache/ ? If not, can you point me at more info? I
can't seem to find info about this on the publicly visible apache-infra
mailing lists thatI could find, and the apache infra website doesn't seem
to mention a docker repository.



> However the current Beam Elasticsearch IO does not support Elasticsearch
5, and elastic does not have an image for version 2, so in this particular case
following the priority order we should use the official docker image (2)
for the tests (assuming that both require the same version). ​ Do you agree
with this ?​

Yup, that makes sense to me.



> How do we deal with IOs that require more than one base image, this is a  
> common
scenario for projects that depend on Zookeeper?

Is there a reason not to just run a kubernetes ReplicaController+Service
for these cases? k8 can easily support having a hostname that pods can rely
on having the zookeeper instance. It also uses text config - see
https://github.com/apache/beam/tree/master/sdks/java/io/jdbc/src/test/resources/kubernetes,
and sets up the connections/nameservice between the hosts - if other tests
wanted to rely on postgres, it could just connect to host "postgres" and
postgres is there.

Basically - I'm trying to keep number of tools at a minimum while still
providing good support for the functionality we need. Does docker-compose
provide something beyond the functionality that k8 does? I'm not familiar
with docker-compose, but looking at
https://docs.docker.com/compose/overview/#compose-documentation it doesn't
seem to provide anything that k8 doesn't already.


S

On Wed, Mar 15, 2017 at 7:10 AM Ismaël Mejía <ieme...@gmail.com> wrote:

Hi, Thanks for bringing this subject to the mailing list.

+1
We definitely need a consensus on this, and I agree with your proposal and
JB’s comments modulo certain clarifications:

I think we shall go in this priority order if the version of the image we
want is available:

1. Image provided by the creator of the data source/sink (if they
officially maintain it). (This is the case of Elasticsearch for example) or
the Apache projects (if they provide one) as JB mentions.
2. Official docker images (because they have security fixes and have
guaranteed maintenance.
3. Non-official docker images or images from other providers that have good
maintainers e.g. quay.io

It makes sense to use the same image for all the tests. and to use the
fixed versions supported by the respective IO to avoid possible issues
during testing between different versions/naming of env variables, etc.

The Elasticsearch case is a 'good' example because it shows all the current
issues:

We should not use one elasticsearch image (elk) for some tests and a
different one for the other (the quay one), and if we resolve by priority
we would take the image provided by the creator (1) for both cases.
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
However the current Beam Elasticsearch IO does not support Elasticsearch 5,
and elastic does not have an image for version 2, so in this particular
case following the priority order we should use the official docker image
(2) for the tests (assuming that both require the same version).
​ Do you agree with this ?​


Thinking about the ELK image I came with a new question. How do we deal
with IOs that require more than one base image, this is a common scenario
for projects that depend on Zookeeper? e.g. Kafka/Solr.  Usually people
coordinate those with a docker-compose file that creates an artificial
network to connect the Zookeeper image and the Kafka/Solr one
​ just executing the 'docker-compose up' command​
. Will we adopt this for such cases ?

I know that Kubernetes does this too, but the docker-compose format is
quite easy and textual,
​and it is usually ready with the docker installation, additionally the
docker-compose files can easily be translated with kompose into Kubernetes
resources.

Ismaël

On Wed, Mar 15, 2017 at 3:17 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Stephen,
>
> 1. About the docker repositories, we now have official Docker repo at
> Apache. So, for the Apache projects, I would recommend the Apache official
> repo. Anyway, generally speaking, I would recommend the official repo
(from
> the projects).
>
> 2. To avoid "unpredictable" breaking change, I would pin to a particular
> versions, and explicitly update if needed.
>
> 3. It's better that docker images are under an unique responsibility scope
> as different IOs can use the same resources, so they should use the same
> provided docker.
>
> By the way, I also have a docker coming for RedisIO ;)
>
> Regards
> JB
>
>
> On 03/15/2017 08:01 AM, Stephen Sisk wrote:
>
>> hi!
>>
>> as part of doing the work to enable IO ITs, we decided we want to use
>> docker. As part of that, we need to run docker images and they'll
probably
>> be pulled from a docker repository.
>>
>> Questions:
>> * What docker repositories (and users on docker hub) do we as a group
>> allow
>> for images we'll run for hosted data stores?
>>  -> My proposal is we should only use repositories/images that are
>> regularly updated and that have someone saying that the images we depend
>> on
>> are secure. In the set of images currently linked to by checked in
code/in
>> PR code, quay.io and official docker images seem fine. They both have
>> security scans (for what that's worth) and generally seem okay.
>>
>> * Do we pin to particular docker images or allow our version to float?
>>  -> I have seen docker images change in insecure way (e.g. switching the
>> name of the password parameter, meaning that the data store was secure
>> when
>> set up, and became insecure because no password was set after the image
>> update), so I'd prefer to pin to particular versions, and update on a
>> periodic basis.
>>
>> I'm relatively new to docker best practices, so I'm open to suggestions
on
>> this.
>>
>> Current ITs with docker images:
>> * Jdbc - https://hub.docker.com/_/postgres/  (official image)
>> * Elasticsearch - https://hub.docker.com/r/sebp/elk/ (semi-official
>> looking
>> image)
>> * (PR in-flight
>> <https://github.com/apache/beam/pull/2193/files#diff-a630b5f
>> ff9aebc9e99a3f324c9cf75a9R52>)
>> HadoopInputFormat's elasticsearch and cassandra tests -
>> https://hub.docker.com/_/cassandra/ and
>> https://quay.io/repository/pires/docker-elasticsearch-kubern
>> etes?tag=5.2.2&tab=tags
>> (official image, and image from quay.io, which provides security audits
>> of
>> their images)
>>
>> The more I think about it, the less I'm excited about the sebp/elk image
-
>> I'm sure it's fine, but I'd prefer using images from a source that we
know
>> is trying to check for security problems.
>>
>> There's a secondary problem that we're using two different elasticsearch
>> images - I'd like to use only one image. I'll follow up on that -
>> https://issues.apache.org/jira/browse/BEAM-1644
>>
>> S
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to