for 4 - there's a number of logistics involved. How do you propose handling
cost, potential DOS, etc? People in different timezones would need to be
oncall for it since it impacts people's ability to dev work (or they need
to be okay if it goes out.) Can you give some reasons why you think it's
better than the other options? I put it on the list, but I'm strongly not a
fan.

S

On Sat, Apr 8, 2017 at 5:31 AM Ted Yu <yuzhih...@gmail.com> wrote:

> +1
>
> > On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
> >
> > Hi Stephen,
> >
> > I think we should go to 1 and 4:
> >
> > 1. Try to use existing images providing what we need. If we don't find
> existing image, we can always ask and help other community to provide so.
> > 4. If we don't find a suitable image, and waiting for this image, we can
> store the image in our own "IT dockerhub".
> >
> > Regards
> > JB
> >
> >> On 04/08/2017 01:03 AM, Stephen Sisk wrote:
> >> Wanted to see if anyone else had opinions on this/provide a quick
> update.
> >>
> >> I think for both elasticsearch and HIFIO that we can find existing,
> >> supported images that could serve those purposes - HIFIO is looking like
> >> it'll able to do so for cassandra, which was proving tricky.
> >>
> >> So to summarize my current proposed solutions: (ordered by my
> preference)
> >> 1. (new) Strongly urge people to find existing docker images that meet
> our
> >> image criteria - regularly updated/security checked
> >> 2. Start using helm
> >> 3. Push our docker images to docker hub
> >> 4. Host our own public container registry
> >>
> >> S
> >>
> >>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk <s...@google.com> wrote:
> >>>
> >>> I'd like to hear what direction folks want to go in, and from there
> look
> >>> at the options. I think for some of these options (like running our own
> >>> public registry), they may be able to and it's something we should
> look at,
> >>> but I don't assume they have time to work on this type of issue.
> >>>
> >>> S
> >>>
> >>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik <lc...@google.com.invalid>
> >>> wrote:
> >>>
> >>> Is this something that Apache infra could help us with?
> >>>
> >>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk <s...@google.com.invalid>
> >>> wrote:
> >>>
> >>>> Summary:
> >>>>
> >>>> For IO ITs that use data stores that need custom docker images in
> order
> >>> to
> >>>> run, we can't currently use them in a kubernetes cluster (which is
> where
> >>> we
> >>>> host our data stores.) I have a couple options for how to solve this
> and
> >>> am
> >>>> looking for feedback from folks involved in creating IO ITs/opinions
> on
> >>>> kubernetes.
> >>>>
> >>>>
> >>>> Details:
> >>>>
> >>>> We've discussed in the past that we'll want to allow developers to
> submit
> >>>> just a dockerfile, and then we'll use that when creating the data
> store
> >>> on
> >>>> kubernetes. This is the case for ElasticsearchIO and I assume more
> data
> >>>> stores in the future will want to do this. It's also looking like
> it'll
> >>> be
> >>>> necessary to use custom docker images for the HadoopInputFormatIO's
> >>>> cassandra ITs - to run a cassandra cluster, there doesn't seem to be a
> >>> good
> >>>> image you can use out of the box.
> >>>>
> >>>> In either case, in order to retrieve a docker image, kubernetes needs
> a
> >>>> container registry - it will read the docker images from there. A
> simple
> >>>> private container registry doesn't work because kubernetes config
> files
> >>> are
> >>>> static - this means that if local devs try to use the kubernetes
> files,
> >>>> they point at the private container registry and they wouldn't be
> able to
> >>>> retrieve the images since they don't have access. They'd have to
> manually
> >>>> edit the files, which in theory is an option, but I don't consider
> that
> >>> to
> >>>> be acceptable since it feels pretty unfriendly (it is simple, so if we
> >>>> really don't like the below options we can revisit it.)
> >>>>
> >>>> Quick summary of the options
> >>>>
> >>>> =======================
> >>>>
> >>>> We can:
> >>>>
> >>>> * Start using something like k8 helm - this adds more dependencies,
> adds
> >>> a
> >>>> small amount of complexity (this is my recommendation, but only by a
> >>>> little)
> >>>>
> >>>> * Start pushing images to docker hub - this means they'll be publicly
> >>>> visible and raises the bar for maintenance of those images
> >>>>
> >>>> * Host our own public container registry - this means running our own
> >>>> public service with costs, etc..
> >>>>
> >>>> Below are detailed discussions of these options. You can skip to the
> "My
> >>>> thoughts on this" section if you're not interested in the details.
> >>>>
> >>>>
> >>>> 1. Templated kubernetes images
> >>>>
> >>>> =========================
> >>>>
> >>>> Kubernetes (k8) does not currently have built in support for
> >>> parameterizing
> >>>> scripts - there's an issues open for this[1], but it doesn't seem to
> be
> >>>> very active.
> >>>>
> >>>> There are tools like Kubernetes helm that allow users to specify
> >>> parameters
> >>>> when running their kubernetes scripts. They also enable a lot more
> >>> (they're
> >>>> probably closer to a package manager like apt-get) - see this
> >>>> description[3] for an overview.
> >>>>
> >>>> I'm open to other options besides helm, but it seems to be the
> officially
> >>>> supported one.
> >>>>
> >>>> How the world would look using helm:
> >>>>
> >>>> * When developing an IO IT, someone (either the developer or one of
> us),
> >>>> would need to create a chart (the name for the helm script) - it's
> >>>> basically another set of config files but in theory is as simple as a
> >>>> couple metadata files plus a templatized version of a regular k8
> script.
> >>>> This should be trivial compared to the task of creating a k8 script.
> >>>>
> >>>> *  When creating an instance of a data store, the developer (or the
> beam
> >>> CI
> >>>> server) would first build the docker image for the data store and
> push to
> >>>> their container registry, then run a command like `helm install -f
> >>>> mydb.yaml --set imageRepo=1.2.3.4`
> >>>>
> >>>> * when done running tests/developing/etc…  the developer/beam CI
> server
> >>>> would run `helm delete -f mydb.yaml`
> >>>>
> >>>> Upsides:
> >>>>
> >>>> * Something like helm is pretty interesting - we talked about it as an
> >>>> upside and something we wanted to do when we talked about using
> >>> kubernetes
> >>>>
> >>>> * We pick up a set of working kubernetes scripts this way. The full
> list
> >>> is
> >>>> at [2], but some ones that stood out: mongodb, memcached, mysql,
> >>> postgres,
> >>>> redis, elasticsearch (incubating), kafka (incubating), zookeeper
> >>>> (incubating) - this could speed development
> >>>>
> >>>> Downsides:
> >>>>
> >>>> * Adds an additional dependency to run our ITs (helm or another k8
> >>>> templating tool)
> >>>>
> >>>> * Requires people to build their own images run a container registry
> if
> >>>> they don't already have one (it will not surprise you that there's a
> >>> docker
> >>>> image for running the registry [0] - so it's not crazy. :) I *think*
> this
> >>>> will probably just be a simple one/two line command once we have it
> >>>> scripted.
> >>>>
> >>>> * Helm in particular is kind of heavyweight for what we really need -
> it
> >>>> requires running a service in the k8 cluster and adds additional
> >>>> complexity.
> >>>>
> >>>> * Adds to the complexity of creating a new kubernetes script. Until
> I've
> >>>> tried it, I can't really speak to the complexity, but taking a look at
> >>> the
> >>>> instructions [4], it doesn't seem too bad.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 2. Push images to docker hub
> >>>>
> >>>> =======================
> >>>>
> >>>> This requires that users push images that we want to use to docker
> hub,
> >>> and
> >>>> then our IO ITs will rely on that. I  think the developer of the
> >>> dockerfile
> >>>> should be responsible for the image - having the beam project
> responsible
> >>>> for a publicly available artifact (like the docker images) outside of
> our
> >>>> core deliverables doesn't seem like the right move.
> >>>>
> >>>> We would still retain a copy of the source dockerfiles and could
> >>> regenerate
> >>>> the images at any time, so I'm not concerned about a scenario where
> >>> docker
> >>>> hub went away (it would be pretty simple to switch to another repo -
> just
> >>>> change some config files.)
> >>>>
> >>>> For someone running the k8 scripts (ie, running the IO ITs), this is
> >>> pretty
> >>>> easy - they just run the k8 script like they do today.
> >>>>
> >>>> For someone creating the k8 scripts (ie, creating the IO ITs), this is
> >>> more
> >>>> complex - either they or we have to push this to docker hub and make
> sure
> >>>> it's up to date, etc..
> >>>>
> >>>>
> >>>> Upsides:
> >>>>
> >>>> * No additional complexity for IO IT runners.
> >>>>
> >>>> Downsides:
> >>>>
> >>>> * Higher bar for creating the image in the first place - someone has
> to
> >>>> maintain the publicly available docker hub image.
> >>>>
> >>>> * It seems weird to have a custom docker image up on docker hub -
> maybe
> >>>> that's common, but if we need specific changes to images for our
> needs,
> >>> I'd
> >>>> prefer it be private.
> >>>>
> >>>>
> >>>> 3. Run our own *public* container registry
> >>>>
> >>>> ==============================================
> >>>>
> >>>> We would run a beam-specific container registry service - it would be
> >>> used
> >>>> by the apache beam CI servers, but it would also be available for use
> by
> >>>> anyone running beam IO ITs on their local dev setup.
> >>>>
> >>>> From a IO IT creator's perspective, this would look pretty similar to
> how
> >>>> things are now - they just check in a dockerfile. For someone running
> the
> >>>> k8 scripts, they similarly don't need to think about it.
> >>>>
> >>>> Upsides:
> >>>>
> >>>> * we're not adding any additional complexity for end developer
> >>>>
> >>>> Downsides:
> >>>>
> >>>> * Have to keep docker registry software up to date
> >>>>
> >>>> * The service is a single of failure for any beam devs running IO ITs
> >>>>
> >>>> * It can incur costs, etc… As an open source project, it doesn't seem
> >>> great
> >>>> for us to be running a public service.
> >>>>
> >>>>
> >>>>
> >>>> My thoughts on this
> >>>>
> >>>> ===============
> >>>>
> >>>> In spite of the additional complexity, I think using k8 helm is
> probably
> >>>> the best option. The general goal behind the IO ITs has been to keep
> >>>> ourselves self-contained: avoid having centralized infrastructure for
> >>> those
> >>>> running the ITs. Helm is a good match for those criteria. I will admit
> >>> that
> >>>> I find the additional dependencies/complexity to be worrisome.
> However, I
> >>>> really like the idea of picking up additional data store configs for
> >>> free -
> >>>> if we were doing this in 5 years, we'd say "we should just use the
> >>>> ecosystem of helm charts" and go from there.
> >>>>
> >>>> I do think that pushing images to docker hub is a viable option, and
> if
> >>> the
> >>>> community is more excited to do that/wants to push the images there,
> I'd
> >>>> support it. I can see how folks would be hesitant. I would like for
> the
> >>>> developer of the docker file to do
> >>>>
> >>>> Of the 3 options, I would strongly push back against running a public
> >>>> container registry - I would not want to administer it, and I don't
> think
> >>>> we as a project want to be paying for the costs associated with it.
> >>>>
> >>>> Next steps
> >>>>
> >>>> =========
> >>>>
> >>>> Let me know what you think! This is definitely a topic where
> >>> understanding
> >>>> what the community of IO devs wants is helpful. As we discuss, I'll
> >>>> probably spend a little time exploring helm since I want to play
> around
> >>>> with it and understand if there are other drawbacks. I ran into this
> >>>> question while working on getting the HIFIO cassandra cluster running,
> >>> so I
> >>>> might prototype with that.
> >>>>
> >>>> I'll create JIRA for this in the next day or so.
> >>>>
> >>>> Stephen
> >>>>
> >>>>
> >>>>
> >>>> [0] docker registry container - https://hub.docker.com/_/registry/
> >>>>
> >>>> [1] kubernetes issue open for supporting templates -
> >>>> https://github.com/kubernetes/kubernetes/issues/23896
> >>>>
> >>>> [2] set of available charts - https://github.com/kubernetes/charts
> >>>>
> >>>> [3] kubernetes helm introduction -
> >>>> https://deis.com/blog/2015/introducing-helm-for-kubernetes/
> >>>> [4] kubernetes charts instructions -
> >>>> https://github.com/kubernetes/helm/blob/master/docs/charts.md
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
>

Reply via email to