Agree it's what I said in a previous email. Regards JB
On Apr 10, 2017, 18:58, at 18:58, Ekrem Aksoy <[email protected]> wrote: >Hi Stephen, > >Can we piggyback on current Apache Docker Hub account? I think images >can >be hold there, too. > >-E > >On Mon, Apr 10, 2017 at 5:22 PM, Stephen Sisk <[email protected]> >wrote: > >> for 4 - there's a number of logistics involved. How do you propose >handling >> cost, potential DOS, etc? People in different timezones would need to >be >> oncall for it since it impacts people's ability to dev work (or they >need >> to be okay if it goes out.) Can you give some reasons why you think >it's >> better than the other options? I put it on the list, but I'm strongly >not a >> fan. >> >> S >> >> On Sat, Apr 8, 2017 at 5:31 AM Ted Yu <[email protected]> wrote: >> >> > +1 >> > >> > > On Apr 7, 2017, at 10:46 PM, Jean-Baptiste Onofré ><[email protected]> >> > wrote: >> > > >> > > Hi Stephen, >> > > >> > > I think we should go to 1 and 4: >> > > >> > > 1. Try to use existing images providing what we need. If we don't >find >> > existing image, we can always ask and help other community to >provide so. >> > > 4. If we don't find a suitable image, and waiting for this image, >we >> can >> > store the image in our own "IT dockerhub". >> > > >> > > Regards >> > > JB >> > > >> > >> On 04/08/2017 01:03 AM, Stephen Sisk wrote: >> > >> Wanted to see if anyone else had opinions on this/provide a >quick >> > update. >> > >> >> > >> I think for both elasticsearch and HIFIO that we can find >existing, >> > >> supported images that could serve those purposes - HIFIO is >looking >> like >> > >> it'll able to do so for cassandra, which was proving tricky. >> > >> >> > >> So to summarize my current proposed solutions: (ordered by my >> > preference) >> > >> 1. (new) Strongly urge people to find existing docker images >that meet >> > our >> > >> image criteria - regularly updated/security checked >> > >> 2. Start using helm >> > >> 3. Push our docker images to docker hub >> > >> 4. Host our own public container registry >> > >> >> > >> S >> > >> >> > >>> On Tue, Apr 4, 2017 at 10:16 AM Stephen Sisk <[email protected]> >> wrote: >> > >>> >> > >>> I'd like to hear what direction folks want to go in, and from >there >> > look >> > >>> at the options. I think for some of these options (like running >our >> own >> > >>> public registry), they may be able to and it's something we >should >> > look at, >> > >>> but I don't assume they have time to work on this type of >issue. >> > >>> >> > >>> S >> > >>> >> > >>> On Tue, Apr 4, 2017 at 10:00 AM Lukasz Cwik ><[email protected] >> > >> > >>> wrote: >> > >>> >> > >>> Is this something that Apache infra could help us with? >> > >>> >> > >>> On Mon, Apr 3, 2017 at 7:22 PM, Stephen Sisk ><[email protected] >> > >> > >>> wrote: >> > >>> >> > >>>> Summary: >> > >>>> >> > >>>> For IO ITs that use data stores that need custom docker images >in >> > order >> > >>> to >> > >>>> run, we can't currently use them in a kubernetes cluster >(which is >> > where >> > >>> we >> > >>>> host our data stores.) I have a couple options for how to >solve this >> > and >> > >>> am >> > >>>> looking for feedback from folks involved in creating IO >ITs/opinions >> > on >> > >>>> kubernetes. >> > >>>> >> > >>>> >> > >>>> Details: >> > >>>> >> > >>>> We've discussed in the past that we'll want to allow >developers to >> > submit >> > >>>> just a dockerfile, and then we'll use that when creating the >data >> > store >> > >>> on >> > >>>> kubernetes. This is the case for ElasticsearchIO and I assume >more >> > data >> > >>>> stores in the future will want to do this. It's also looking >like >> > it'll >> > >>> be >> > >>>> necessary to use custom docker images for the >HadoopInputFormatIO's >> > >>>> cassandra ITs - to run a cassandra cluster, there doesn't seem >to >> be a >> > >>> good >> > >>>> image you can use out of the box. >> > >>>> >> > >>>> In either case, in order to retrieve a docker image, >kubernetes >> needs >> > a >> > >>>> container registry - it will read the docker images from >there. A >> > simple >> > >>>> private container registry doesn't work because kubernetes >config >> > files >> > >>> are >> > >>>> static - this means that if local devs try to use the >kubernetes >> > files, >> > >>>> they point at the private container registry and they wouldn't >be >> > able to >> > >>>> retrieve the images since they don't have access. They'd have >to >> > manually >> > >>>> edit the files, which in theory is an option, but I don't >consider >> > that >> > >>> to >> > >>>> be acceptable since it feels pretty unfriendly (it is simple, >so if >> we >> > >>>> really don't like the below options we can revisit it.) >> > >>>> >> > >>>> Quick summary of the options >> > >>>> >> > >>>> ======================= >> > >>>> >> > >>>> We can: >> > >>>> >> > >>>> * Start using something like k8 helm - this adds more >dependencies, >> > adds >> > >>> a >> > >>>> small amount of complexity (this is my recommendation, but >only by a >> > >>>> little) >> > >>>> >> > >>>> * Start pushing images to docker hub - this means they'll be >> publicly >> > >>>> visible and raises the bar for maintenance of those images >> > >>>> >> > >>>> * Host our own public container registry - this means running >our >> own >> > >>>> public service with costs, etc.. >> > >>>> >> > >>>> Below are detailed discussions of these options. You can skip >to the >> > "My >> > >>>> thoughts on this" section if you're not interested in the >details. >> > >>>> >> > >>>> >> > >>>> 1. Templated kubernetes images >> > >>>> >> > >>>> ========================= >> > >>>> >> > >>>> Kubernetes (k8) does not currently have built in support for >> > >>> parameterizing >> > >>>> scripts - there's an issues open for this[1], but it doesn't >seem to >> > be >> > >>>> very active. >> > >>>> >> > >>>> There are tools like Kubernetes helm that allow users to >specify >> > >>> parameters >> > >>>> when running their kubernetes scripts. They also enable a lot >more >> > >>> (they're >> > >>>> probably closer to a package manager like apt-get) - see this >> > >>>> description[3] for an overview. >> > >>>> >> > >>>> I'm open to other options besides helm, but it seems to be the >> > officially >> > >>>> supported one. >> > >>>> >> > >>>> How the world would look using helm: >> > >>>> >> > >>>> * When developing an IO IT, someone (either the developer or >one of >> > us), >> > >>>> would need to create a chart (the name for the helm script) - >it's >> > >>>> basically another set of config files but in theory is as >simple as >> a >> > >>>> couple metadata files plus a templatized version of a regular >k8 >> > script. >> > >>>> This should be trivial compared to the task of creating a k8 >script. >> > >>>> >> > >>>> * When creating an instance of a data store, the developer >(or the >> > beam >> > >>> CI >> > >>>> server) would first build the docker image for the data store >and >> > push to >> > >>>> their container registry, then run a command like `helm >install -f >> > >>>> mydb.yaml --set imageRepo=1.2.3.4` >> > >>>> >> > >>>> * when done running tests/developing/etc… the developer/beam >CI >> > server >> > >>>> would run `helm delete -f mydb.yaml` >> > >>>> >> > >>>> Upsides: >> > >>>> >> > >>>> * Something like helm is pretty interesting - we talked about >it as >> an >> > >>>> upside and something we wanted to do when we talked about >using >> > >>> kubernetes >> > >>>> >> > >>>> * We pick up a set of working kubernetes scripts this way. The >full >> > list >> > >>> is >> > >>>> at [2], but some ones that stood out: mongodb, memcached, >mysql, >> > >>> postgres, >> > >>>> redis, elasticsearch (incubating), kafka (incubating), >zookeeper >> > >>>> (incubating) - this could speed development >> > >>>> >> > >>>> Downsides: >> > >>>> >> > >>>> * Adds an additional dependency to run our ITs (helm or >another k8 >> > >>>> templating tool) >> > >>>> >> > >>>> * Requires people to build their own images run a container >registry >> > if >> > >>>> they don't already have one (it will not surprise you that >there's a >> > >>> docker >> > >>>> image for running the registry [0] - so it's not crazy. :) I >*think* >> > this >> > >>>> will probably just be a simple one/two line command once we >have it >> > >>>> scripted. >> > >>>> >> > >>>> * Helm in particular is kind of heavyweight for what we really >need >> - >> > it >> > >>>> requires running a service in the k8 cluster and adds >additional >> > >>>> complexity. >> > >>>> >> > >>>> * Adds to the complexity of creating a new kubernetes script. >Until >> > I've >> > >>>> tried it, I can't really speak to the complexity, but taking a >look >> at >> > >>> the >> > >>>> instructions [4], it doesn't seem too bad. >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> 2. Push images to docker hub >> > >>>> >> > >>>> ======================= >> > >>>> >> > >>>> This requires that users push images that we want to use to >docker >> > hub, >> > >>> and >> > >>>> then our IO ITs will rely on that. I think the developer of >the >> > >>> dockerfile >> > >>>> should be responsible for the image - having the beam project >> > responsible >> > >>>> for a publicly available artifact (like the docker images) >outside >> of >> > our >> > >>>> core deliverables doesn't seem like the right move. >> > >>>> >> > >>>> We would still retain a copy of the source dockerfiles and >could >> > >>> regenerate >> > >>>> the images at any time, so I'm not concerned about a scenario >where >> > >>> docker >> > >>>> hub went away (it would be pretty simple to switch to another >repo - >> > just >> > >>>> change some config files.) >> > >>>> >> > >>>> For someone running the k8 scripts (ie, running the IO ITs), >this is >> > >>> pretty >> > >>>> easy - they just run the k8 script like they do today. >> > >>>> >> > >>>> For someone creating the k8 scripts (ie, creating the IO ITs), >this >> is >> > >>> more >> > >>>> complex - either they or we have to push this to docker hub >and make >> > sure >> > >>>> it's up to date, etc.. >> > >>>> >> > >>>> >> > >>>> Upsides: >> > >>>> >> > >>>> * No additional complexity for IO IT runners. >> > >>>> >> > >>>> Downsides: >> > >>>> >> > >>>> * Higher bar for creating the image in the first place - >someone has >> > to >> > >>>> maintain the publicly available docker hub image. >> > >>>> >> > >>>> * It seems weird to have a custom docker image up on docker >hub - >> > maybe >> > >>>> that's common, but if we need specific changes to images for >our >> > needs, >> > >>> I'd >> > >>>> prefer it be private. >> > >>>> >> > >>>> >> > >>>> 3. Run our own *public* container registry >> > >>>> >> > >>>> ============================================== >> > >>>> >> > >>>> We would run a beam-specific container registry service - it >would >> be >> > >>> used >> > >>>> by the apache beam CI servers, but it would also be available >for >> use >> > by >> > >>>> anyone running beam IO ITs on their local dev setup. >> > >>>> >> > >>>> From a IO IT creator's perspective, this would look pretty >similar >> to >> > how >> > >>>> things are now - they just check in a dockerfile. For someone >> running >> > the >> > >>>> k8 scripts, they similarly don't need to think about it. >> > >>>> >> > >>>> Upsides: >> > >>>> >> > >>>> * we're not adding any additional complexity for end developer >> > >>>> >> > >>>> Downsides: >> > >>>> >> > >>>> * Have to keep docker registry software up to date >> > >>>> >> > >>>> * The service is a single of failure for any beam devs running >IO >> ITs >> > >>>> >> > >>>> * It can incur costs, etc… As an open source project, it >doesn't >> seem >> > >>> great >> > >>>> for us to be running a public service. >> > >>>> >> > >>>> >> > >>>> >> > >>>> My thoughts on this >> > >>>> >> > >>>> =============== >> > >>>> >> > >>>> In spite of the additional complexity, I think using k8 helm >is >> > probably >> > >>>> the best option. The general goal behind the IO ITs has been >to keep >> > >>>> ourselves self-contained: avoid having centralized >infrastructure >> for >> > >>> those >> > >>>> running the ITs. Helm is a good match for those criteria. I >will >> admit >> > >>> that >> > >>>> I find the additional dependencies/complexity to be worrisome. >> > However, I >> > >>>> really like the idea of picking up additional data store >configs for >> > >>> free - >> > >>>> if we were doing this in 5 years, we'd say "we should just use >the >> > >>>> ecosystem of helm charts" and go from there. >> > >>>> >> > >>>> I do think that pushing images to docker hub is a viable >option, and >> > if >> > >>> the >> > >>>> community is more excited to do that/wants to push the images >there, >> > I'd >> > >>>> support it. I can see how folks would be hesitant. I would >like for >> > the >> > >>>> developer of the docker file to do >> > >>>> >> > >>>> Of the 3 options, I would strongly push back against running a >> public >> > >>>> container registry - I would not want to administer it, and I >don't >> > think >> > >>>> we as a project want to be paying for the costs associated >with it. >> > >>>> >> > >>>> Next steps >> > >>>> >> > >>>> ========= >> > >>>> >> > >>>> Let me know what you think! This is definitely a topic where >> > >>> understanding >> > >>>> what the community of IO devs wants is helpful. As we discuss, >I'll >> > >>>> probably spend a little time exploring helm since I want to >play >> > around >> > >>>> with it and understand if there are other drawbacks. I ran >into this >> > >>>> question while working on getting the HIFIO cassandra cluster >> running, >> > >>> so I >> > >>>> might prototype with that. >> > >>>> >> > >>>> I'll create JIRA for this in the next day or so. >> > >>>> >> > >>>> Stephen >> > >>>> >> > >>>> >> > >>>> >> > >>>> [0] docker registry container - >https://hub.docker.com/_/registry/ >> > >>>> >> > >>>> [1] kubernetes issue open for supporting templates - >> > >>>> https://github.com/kubernetes/kubernetes/issues/23896 >> > >>>> >> > >>>> [2] set of available charts - >https://github.com/kubernetes/charts >> > >>>> >> > >>>> [3] kubernetes helm introduction - >> > >>>> https://deis.com/blog/2015/introducing-helm-for-kubernetes/ >> > >>>> [4] kubernetes charts instructions - >> > >>>> https://github.com/kubernetes/helm/blob/master/docs/charts.md >> > > >> > > -- >> > > Jean-Baptiste Onofré >> > > [email protected] >> > > http://blog.nanthrax.net >> > > Talend - http://www.talend.com >> > >>
