You are right.  I had overlooked the state kept in the Invoker's memory in
the ContainerProxy instance via the WarmedData/PrewarmedData instances.

It might need a broader rethinking of the architecture if we want to be
able to salvage the warmed/prewarmed containers of a failed-and-replaced
ContainerPool instance.  Perhaps also an argument for continuing to split
the workload across multiple ContainerPool instances even when we are using
an underlying cluster-wide scheduler to actually allocate execution
resources.  Reduce the blast zone when a ContainerPool instance fails...

--dave

Tyson Norris <tnor...@adobe.com.INVALID> wrote on 04/03/2018 12:00:09 PM:

> From: Tyson Norris <tnor...@adobe.com.INVALID>
> To: "dev@openwhisk.apache.org" <dev@openwhisk.apache.org>
> Date: 04/03/2018 12:00 PM
> Subject: Re: Invoker HA on Mesos
>
> One problem with this (delegating to ContainerFactory to share
> prewarm/warm containers to other cluster nodes) is that
> ContainerFactory currently is previously ignorant of container state
> - and making use of the shared containers requires sharing at least
> some of their state (besides paused/running state). Specifically:
> - creating a prewarm, the kind needs to be shared
> - pausing a warm, the action needs to be shared
>
> To handle this, the ContainerFactory.createContainer(),
> Container.suspend() and Container.resume() would have to change to
> propagate this state.
>
> This seems slightly awkward to me, so want to put it out for feedback.
WDYT?
>
>
>
> On Mar 30, 2018, at 2:31 PM, David P Grove <gro...@us.ibm.com<
> mailto:gro...@us.ibm.com>> wrote:
>
>
> +1.  I like this design.
>
> --dave
>
> Tyson Norris <tnor...@adobe.com.INVALID<mailto:tnor...@adobe.com.INVALID
> >> wrote on 03/30/2018 01:37:43 PM:
>
> From: Tyson Norris
<tnor...@adobe.com.INVALID<mailto:tnor...@adobe.com.INVALID
> >>
> To: "dev@openwhisk.apache.org<mailto:dev@openwhisk.apache.org>"
> <dev@openwhisk.apache.org<mailto:dev@openwhisk.apache.org>>
> Date: 03/30/2018 01:37 PM
> Subject: Re: Invoker HA on Mesos
>
> Hooking into pause/unpause/destroy of containers seems plausible,
> instead of hooking into the Maps in ContainerPool.
>
> So in the existing PR, the ContainerPool uses an alternate impl for
> Map to store freePool and prewarmPool, and that alternate impl
> initiates the attach to existing containers, when it becomes active.
>
> The ContainerPool could instead potentially delegate to the
> ContainerFactory, e.g. a
> ContainerFactory.reviveContainers(childFactory) => (freePool,
> prewarmPool) - we will still need a way to trigger this on demand
> (e.g. when the standby pool becomes active, in our case, but I think
> that is a minor detail).
>
> I can try it out; I will be out next week, but if you test any of
> this in the meantime, let me know.
>
> Thanks
> Tyson
>
>
> On Mar 30, 2018, at 9:58 AM, David P Grove <gro...@us.ibm.com<
> mailto:gro...@us.ibm.com>> wrote:
>
>
> Tyson Norris <tnor...@adobe.com.INVALID<mailto:tnor...@adobe.com.INVALID
> >> wrote on 03/27/2018 06:25:59
> PM:
>
> Do you have an example of the labels working? I guess the labels are
> changed over time through the lifecycle of the container?
>
>
> Apologies for brutally chopping the email chain; my mail client made a
> horrible hash of it.
>
> Right now, all we are doing with Kube labels is to label each action
> container with its owning invoker on startup.  This lets us delete
> orphaned
> containers if the invoker crashes and needs to be restarted.  The
> labeling
> happens at [1] and the removal of orphans using the labels at [2].
>
> I think the Kube-native version of part of what you are doing with the
> DistributedData for Mesos would be to add and remove additional labels
> to
> give us the option of attaching a new invoker instance to orphaned
> containers instead of just destroying them.   Interacting with the
> Kubernetes API server to do a labeling operation takes around 10ms, so
> we
> couldn't do this on a truly hot path.  But we could probably afford to
> update container labels in parallel with pause/unpause operations,
> which
> could enable re-attachment to any paused containers.
>
> --dave
>
> [1]
> https://urldefense.proofpoint.com/v2/url?
>
u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-26data-3D02-257C01-257Ctnorris-2540adobe.com-257Ca7a6bc14ead944405aad08d59685d4e4-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580423906584912-26sdata-3DheMhgQgGqt4ku4hDZuAbKRDw96xQkM7anxlvlhoShs0-253D-26reserved-3D0-3F&d=DwIFAg&c=jf_iaSHvJObTbx-

> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
apaNSFb6BSj6AlkptSF2gMk&m=_4WtimU6V1851mZlPlrBh6jlZEqL1OovvTrfC8xU_QQ&s=zwj5kFepU_4NbI--

> YSz27EDJFEpj8CvPfxZhNCpBMHw&e=
>
>
u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com<

> https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__3furl-2D3dhttps-2D253a-2D252f-2D252fgithub.com_&d=DwIFAg&c=jf_iaSHvJObTbx-

> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
apaNSFb6BSj6AlkptSF2gMk&m=_4WtimU6V1851mZlPlrBh6jlZEqL1OovvTrfC8xU_QQ&s=K24BzDS5nSZBV7XCAxpszPcaGTGDMMA0NByAWh0enzo&e=

>
>-252Fapache-252Fincubator-2Dopenwhisk-252Fblob-252F0b20df0f725a671f8e51c9e8793116476fd22f76-252Fcore-252Finvoker-252Fsrc-252Fmain-252Fscala-252Fwhisk-252Fcore-252Fcontainerpool-252Fkubernetes-252FKubernetesContainerFactory.scala-2523L81-26data-3D02-257C01-257Ctnorris-2540adobe.com<

> https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__252fcore-2D252finvoker-2D252fsrc-2D252fmain-2D252fscala-2D252fwhisk-2D252fcore-2D252fcontainerpool-2D252fkubernetes-2D252fkubernetescontainerfactory.scala-2D2523l81-2D26data-2D3d02-2D257c01-2D257ctnorris-2D2540adobe.com_&d=DwIFAg&c=jf_iaSHvJObTbx-

> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
apaNSFb6BSj6AlkptSF2gMk&m=_4WtimU6V1851mZlPlrBh6jlZEqL1OovvTrfC8xU_QQ&s=zlTcnGJ7iDpwweWsJYYL3yfHDB5tZe9E3ZYXj9CZXWw&e=

>
>-257C3ea96a8a416141db52b208d59660052f-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580261502275400-26sdata-3D6XagwDT7CnCoj1nOIHK-252B02bincKYogLkKy0vUXh8jY8-253D-26reserved-3D0&d=DwIFAg&c=jf_iaSHvJObTbx-

>
> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
>
apaNSFb6BSj6AlkptSF2gMk&m=4UxWSqFWfs8nhAEogipIZa9x4X7JbRZ5gLfuemvqWQI&s=AiIYyNqL1l96RBLRXVhvdAaIkrJjdZ-

>
> GRKClR0esbDc&e=
> [2]
> https://urldefense.proofpoint.com/v2/url?
>
u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Furldefense.proofpoint.com-252Fv2-252Furl-26data-3D02-257C01-257Ctnorris-2540adobe.com-257Ca7a6bc14ead944405aad08d59685d4e4-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580423906584912-26sdata-3DheMhgQgGqt4ku4hDZuAbKRDw96xQkM7anxlvlhoShs0-253D-26reserved-3D0-3F&d=DwIFAg&c=jf_iaSHvJObTbx-

> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
apaNSFb6BSj6AlkptSF2gMk&m=_4WtimU6V1851mZlPlrBh6jlZEqL1OovvTrfC8xU_QQ&s=zwj5kFepU_4NbI--

> YSz27EDJFEpj8CvPfxZhNCpBMHw&e=
>
>
u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com<

> https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__3furl-2D3dhttps-2D253a-2D252f-2D252fgithub.com_&d=DwIFAg&c=jf_iaSHvJObTbx-

> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
apaNSFb6BSj6AlkptSF2gMk&m=_4WtimU6V1851mZlPlrBh6jlZEqL1OovvTrfC8xU_QQ&s=K24BzDS5nSZBV7XCAxpszPcaGTGDMMA0NByAWh0enzo&e=

>
>-252Fapache-252Fincubator-2Dopenwhisk-252Fblob-252F0b20df0f725a671f8e51c9e8793116476fd22f76-252Fcore-252Finvoker-252Fsrc-252Fmain-252Fscala-252Fwhisk-252Fcore-252Fcontainerpool-252Fkubernetes-252FKubernetesContainerFactory.scala-2523L57-26data-3D02-257C01-257Ctnorris-2540adobe.com<

> https://urldefense.proofpoint.com/v2/url?
>
u=http-3A__252fcore-2D252finvoker-2D252fsrc-2D252fmain-2D252fscala-2D252fwhisk-2D252fcore-2D252fcontainerpool-2D252fkubernetes-2D252fkubernetescontainerfactory.scala-2D2523l57-2D26data-2D3d02-2D257c01-2D257ctnorris-2D2540adobe.com_&d=DwIFAg&c=jf_iaSHvJObTbx-

> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
apaNSFb6BSj6AlkptSF2gMk&m=_4WtimU6V1851mZlPlrBh6jlZEqL1OovvTrfC8xU_QQ&s=g1paxl5h0H72l4r8qMJton4J7lJCWtsOrL7KtliuO14&e=

>
>-257C3ea96a8a416141db52b208d59660052f-257Cfa7b1b5a7b34438794aed2c178decee1-257C0-257C0-257C636580261502275400-26sdata-3Df6VQl9UMW7gtoFheibT9opXz973hGUVmivlDJg-252FF5Co-253D-26reserved-3D0&d=DwIFAg&c=jf_iaSHvJObTbx-

>
> siA1ZOg&r=Fe4FicGBU_20P2yihxV-
>
>
apaNSFb6BSj6AlkptSF2gMk&m=4UxWSqFWfs8nhAEogipIZa9x4X7JbRZ5gLfuemvqWQI&s=ISliBvpYptlv9AhbicWZSFptIleHy1-

>
> XzCcKuqP7e-0&e=
>

Reply via email to