+1. I like this design. --dave
Tyson Norris <[email protected]> wrote on 03/30/2018 01:37:43 PM: > From: Tyson Norris <[email protected]> > To: "[email protected]" <[email protected]> > Date: 03/30/2018 01:37 PM > Subject: Re: Invoker HA on Mesos > > Hooking into pause/unpause/destroy of containers seems plausible, > instead of hooking into the Maps in ContainerPool. > > So in the existing PR, the ContainerPool uses an alternate impl for > Map to store freePool and prewarmPool, and that alternate impl > initiates the attach to existing containers, when it becomes active. > > The ContainerPool could instead potentially delegate to the > ContainerFactory, e.g. a > ContainerFactory.reviveContainers(childFactory) => (freePool, > prewarmPool) - we will still need a way to trigger this on demand > (e.g. when the standby pool becomes active, in our case, but I think > that is a minor detail). > > I can try it out; I will be out next week, but if you test any of > this in the meantime, let me know. > > Thanks > Tyson > > > > On Mar 30, 2018, at 9:58 AM, David P Grove <[email protected]> wrote: > > > > > > Tyson Norris <[email protected]> wrote on 03/27/2018 06:25:59 PM: > >> > >> Do you have an example of the labels working? I guess the labels are > >> changed over time through the lifecycle of the container? > >> > > > > Apologies for brutally chopping the email chain; my mail client made a > > horrible hash of it. > > > > Right now, all we are doing with Kube labels is to label each action > > container with its owning invoker on startup. This lets us delete orphaned > > containers if the invoker crashes and needs to be restarted. The labeling > > happens at [1] and the removal of orphans using the labels at [2]. > > > > I think the Kube-native version of part of what you are doing with the > > DistributedData for Mesos would be to add and remove additional labels to > > give us the option of attaching a new invoker instance to orphaned > > containers instead of just destroying them. Interacting with the > > Kubernetes API server to do a labeling operation takes around 10ms, so we > > couldn't do this on a truly hot path. But we could probably afford to > > update container labels in parallel with pause/unpause operations, which > > could enable re-attachment to any paused containers. > > > > --dave > > > > [1] > > https://urldefense.proofpoint.com/v2/url? > u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252Fapache-252Fincubator-2Dopenwhisk-252Fblob-252F0b20df0f725a671f8e51c9e8793116476fd22f76-252Fcore-252Finvoker-252Fsrc-252Fmain-252Fscala-252Fwhisk-252Fcore-252Fcontainerpool-252Fkubernetes-252FKubernetesContainerFactory.scala-2523L81-26data-3D02-257C01-257Ctnorris-2540adobe.com-257C3ea96a8a416141db52b208d59660052f-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580261502275400-26sdata-3D6XagwDT7CnCoj1nOIHK-252B02bincKYogLkKy0vUXh8jY8-253D-26reserved-3D0&d=DwIFAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=Fe4FicGBU_20P2yihxV- > apaNSFb6BSj6AlkptSF2gMk&m=4UxWSqFWfs8nhAEogipIZa9x4X7JbRZ5gLfuemvqWQI&s=AiIYyNqL1l96RBLRXVhvdAaIkrJjdZ- > GRKClR0esbDc&e= > > [2] > > https://urldefense.proofpoint.com/v2/url? > u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252Fapache-252Fincubator-2Dopenwhisk-252Fblob-252F0b20df0f725a671f8e51c9e8793116476fd22f76-252Fcore-252Finvoker-252Fsrc-252Fmain-252Fscala-252Fwhisk-252Fcore-252Fcontainerpool-252Fkubernetes-252FKubernetesContainerFactory.scala-2523L57-26data-3D02-257C01-257Ctnorris-2540adobe.com-257C3ea96a8a416141db52b208d59660052f-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580261502275400-26sdata-3Df6VQl9UMW7gtoFheibT9opXz973hGUVmivlDJg-252FF5Co-253D-26reserved-3D0&d=DwIFAg&c=jf_iaSHvJObTbx- > siA1ZOg&r=Fe4FicGBU_20P2yihxV- > apaNSFb6BSj6AlkptSF2gMk&m=4UxWSqFWfs8nhAEogipIZa9x4X7JbRZ5gLfuemvqWQI&s=ISliBvpYptlv9AhbicWZSFptIleHy1- > XzCcKuqP7e-0&e= >
