Re: New scheduling algorithm proposal.

Dominic Kim Wed, 16 May 2018 22:43:09 -0700

Dear all.

Does anyone have any comments on this?
Any comments or opinions would be greatly welcome : )


I think we need around following changes to take this in.

1. SPI supports for ContainerProxy and ContainerPool ( I already opened PR
for this: https://github.com/apache/incubator-openwhisk/pull/3663)
2. SPI supports for Throttling/Entitlement logics
3. New loadbalancer
4. New ContainerPool and ContainerProxy
5. New notions for limit and logics to hanlde more fine-grained limit.(It
should be able to coexist with existing limit)

If there's no any comments, I will open a PR based on it one by one.
Then it would be better for us to discuss on this because we can directly
discuss over basic implementation.

Thanks
Regards
Dominic.



2018-05-09 11:42 GMT+09:00 Dominic Kim <[email protected]>:

> One more thing to clarify is invoker parallelism.
>
> > ## Invoker coordinates all requests.
> >
> > I tend to disagree with the "cannot take advantage of parallel
> processing" bit. Everything in the invoker is parallelized after updating
> its central state (which should take a **very** short amount of time
> relative to actual action runtime). It is not really optimized to scale to
> a lot of containers *yet*.
>
> More precisely, I think it is related to Kafka consumer rather than
> invoker. Invoker logic can run in parallel. But `MessageFeed` seems not.
> Once outstanding message size reaches max size, it will wait for messages
> processed. If few activation messages for an action are not properly
> handled, `MessageFeed` does not consume more message from Kafka(until any
> messages are processed).
> So subsequent messages for other actions cannot be fetched or delayed due
> to unprocessed messages. This is why I mentioned invoker parallelism. I
> think I should rephrase it as `MessageFeed` parallelism.
>
> As you know partition is unit of parallelism in Kafka. If we have multiple
> partitions for activation topic, we can setup multiple consumers and it
> will enable parallel processing for Kafka messages as well.
> Since logics in invoker can already run in parallel, with this change, we
> can process messages entirely in parallel.
>
> In my proposal, I split activation messages from container coordination
> message(action parallelism), assign more partition for activation
> messages(in-topic parallelism) and enable parallel processing with multiple
> consumers(containers).
>
>
> Thanks
> Regards
> Dominic.
>
>
>
>
>
> 2018-05-08 19:34 GMT+09:00 Dominic Kim <[email protected]>:
>
>> Thank you for the response Markus and Christian.
>>
>> Yes I agree that we need to discuss this proposal in abstract way instead
>> in conjunction it with any specific technology because we can take better
>> software stack if possible.
>> Let me answer your questions line by line.
>>
>>
>> ## Does not wait for previous run.
>>
>> Yes it is valid thoughts. If we keep cumulating requests in the queue,
>> latency can be spiky especially in case execution time of action is huge.
>> So if we want to take this in, we need to find proper way to balance
>> creating more containers for latency and making existing containers handle
>> requests.
>>
>>
>> ## Not able to accurately control concurrent invocation.
>>
>> Ok I originally thought this is related to concurrent containers rather
>> than concurrent activations.
>> But I am still inclined to concurrent containers approach.
>> In current logic, it is dependent on factors other than real concurrent
>> invocations.
>> If RTT between controllers and invokers becomes high for some reasons,
>> controller will reject new requests though invokers are actually idle.
>>
>> ## TPS is not deterministic.
>>
>> I meant not deterministic TPS for just one user rather I meant
>> system-wide deterministic TPS.
>> Surely TPS can vary when heterogenous actions(which have different
>> execution time) are invoked.
>> But currently it's not easy to figure out what the TPS is with only 1
>> kind of action because it is changed based on not only heterogeneity of
>> actions but the number of users and namespaces.
>>
>> I think at least we need to be able to have this kind of official spec:
>> In case actions with 20 ms execution time are invoked, our system TPS is
>> 20,000 TPS(no matter how many users or namespaces are used).
>>
>>
>> Your understanding about my proposal is perfectly correct.
>> Small thing to add is, controller sends `ContainerCreation` request based
>> on processing speed of containers rather than availability of existing
>> containers.
>>
>> BTW, regarding your concern about Kafka topic, I think we may be fine
>> because,
>> the number of topics will be unbounded, but the number of active topics
>> will be bounded.
>>
>> If we take this approach, it is mandatory to limit retention bytes and
>> duration for each topics.
>> So the number of active topics is limited and actual data in them are
>> also limited, so I think that would be fine.
>>
>> But it is necessary to have optimal configurations for retention and many
>> benchmark to confirm this.
>>
>>
>> And I didn't get the meaning of eventual consistency of consumer lag.
>> You meant that is eventual consistent because it changes very quickly
>> even within one second?
>>
>>
>> Thanks
>> Regards
>> Dominic
>>
>>
>>
>> 2018-05-08 17:25 GMT+09:00 Markus Thoemmes <[email protected]>:
>>
>>> Hey Dominic,
>>>
>>> Thank you for the very detailed writeup. Since there is a lot in here,
>>> please allow me to rephrase some of your proposals to see if I understood
>>> correctly. I'll go through point-by-point to try to keep it close to your
>>> proposal.
>>>
>>> **Note:** This is a result of an extensive discussion of Christian
>>> Bickel (@cbickel) and myself on this proposal. I used "I" throughout the
>>> writeup for easier readability, but all of it can be read as "we".
>>>
>>> # Issues:
>>>
>>> ## Interventions of actions.
>>>
>>> That's a valid concern when using today's loadbalancer. This is
>>> noisy-neighbor behavior that can happen today under the circumstances you
>>> describe.
>>>
>>> ## Does not wait for previous run.
>>>
>>> True as well today. The algorithms used until today value correctness
>>> over performance. You're right, that you could track the expected queue
>>> occupation and schedule accordingly. That does have its own risks though
>>> (what if your action has very spiky latency behavior?).
>>>
>>> I'd generally propose to break this out into a seperate discussion. It
>>> doesn't really correlate to the other points, WDYT?
>>>
>>> ## Invoker coordinates all requests.
>>>
>>> I tend to disagree with the "cannot take advantage of parallel
>>> processing" bit. Everything in the invoker is parallelized after updating
>>> its central state (which should take a **very** short amount of time
>>> relative to actual action runtime). It is not really optimized to scale to
>>> a lot of containers *yet*.
>>>
>>> ## Not able to accurately control concurrent invocation.
>>>
>>> Well, the limits are "concurrent actions in the system". You should be
>>> able to get 5 activations on the queue with today's mechanism. You should
>>> get as many containers as needed to handle your load. For very
>>> short-running actions, you might not need N containers to handle N messages
>>> in the queue.
>>>
>>> ## TPS is not deterministic.
>>>
>>> I'm wondering: Have TPS been deterministic for just one user? I'd argue
>>> that this is a valid metric on its own kind. I agree that these numbers can
>>> drop significantly under heterogeneous load.
>>>
>>> # Proposal:
>>>
>>> I'll try to rephrase and add some bits of abstraction here and there to
>>> see if I understood this correctly:
>>>
>>> The controller should schedule based on individual actions. It should
>>> not send those to an arbitrary invoker but rather to something that
>>> identifies those actions themselves (a kafka topic in your example). I'll
>>> call this *PerActionContainerPool*. Those calls from the controller will be
>>> handled by each *ContainerProxy* directly rather than being threaded
>>> through another "centralized" component (the invoker). The *ContainerProxy*
>>> is responsible for handling the "aftermath": Writing activation records,
>>> collecting logs etc (like today).
>>>
>>> Iff the controller thinks that the existing containers cannot sustain
>>> the load (i.e. if all containers are currently in use), it advises a
>>> *ContainerCreationSystem* (all invokers combined in your case) to create a
>>> new container. This container will be added to the *PerActionContainerPool*.
>>>
>>> The invoker in your proposal has no scheduling logic at all (which is
>>> sound with the issues lined out above) other than container creation itself.
>>>
>>> # Conclusion:
>>>
>>> I like the proposal in the abstract way I've tried to phrase above. It
>>> indeed amplifies warm-container usage and in general should be superior to
>>> the more statistical approach of today's loadbalancer.
>>>
>>> I think we should discuss this proposal in an abstract,
>>> non-technology-bound way. I do think that having so many kafka topics
>>> including all the rebalancing needed can become an issue, especially
>>> because the sheer number of kafka topics is unbounded. I also think that
>>> the consumer lag is subject to eventual consistency and depending on how
>>> eventual that is it can turn into queueing in your system, even though that
>>> wouldn't be necessary from a capacity perspective.
>>>
>>> I don't want to ditch the proposal because of those concerns though!
>>>
>>> As I said: The proposal itself makes a lot of sense and I like it a lot!
>>> Let's not trap ourselves in the technology used today though. You're
>>> proposing a major restructuring so we might as well think more
>>> green-fieldy. WDYT?
>>>
>>> Cheers,
>>> Christian and Markus
>>>
>>>
>>
>

Re: New scheduling algorithm proposal.

Reply via email to