That's right. To add to it. We added multi threading to python streaming as a single thread is sub optimal for streaming use case. Shall we move towards a conclusion on the SDK bundle processing upper bound?
On Mon, Aug 20, 2018 at 1:54 PM Lukasz Cwik <lc...@google.com> wrote: > Ankur, I can see where you are going with your argument. I believe there > is certain information which is static and won't change at pipeline > creation time (such as Python SDK is most efficient doing one bundle at a > time) and some stuff which is best at runtime, like memory and CPU limits, > worker count. > > On Mon, Aug 20, 2018 at 1:47 PM Ankur Goenka <goe...@google.com> wrote: > >> I would prefer to to keep it dynamic as it can be changed by the >> infrastructure or the pipeline author. >> Like in case of Python, number of concurrent bundle can be changed by >> setting pipeline option worker_count. And for Java it can be computed based >> on the cpus on the machine. >> >> For Flink runner, we can use the worker_count parameter for now to >> increase the parallelism. And we can have 1 container for each mapPartition >> task on Flink while reusing containers as container creation is expensive >> especially for Python where it installs a bunch of dependencies. There is 1 >> caveat though. I have seen machine crashes because of too many simultaneous >> container creation. We can rate limit container creation in the code to >> avoid this. >> >> Thanks, >> Ankur >> >> On Mon, Aug 20, 2018 at 9:20 AM Lukasz Cwik <lc...@google.com> wrote: >> >>> +1 on making the resources part of a proto. Based upon what Henning >>> linked to, the provisioning API seems like an appropriate place to provide >>> this information. >>> >>> Thomas, I believe the environment proto is the best place to add >>> information that a runner may want to know about upfront during pipeline >>> pipeline creation. I wouldn't stick this into PipelineOptions for the long >>> term. >>> If you have time to capture these thoughts and update the environment >>> proto, I would suggest going down that path. Otherwise anything short term >>> like PipelineOptions will do. >>> >>> On Sun, Aug 19, 2018 at 5:41 PM Thomas Weise <t...@apache.org> wrote: >>> >>>> For SDKs where the upper limit is constant and known upfront, why not >>>> communicate this along with the other harness resource info as part of the >>>> job submission? >>>> >>>> Regarding use of GRPC headers: Why not make this explicit in the proto >>>> instead? >>>> >>>> WRT runner dictating resource constraints: The runner actually may also >>>> not have that information. It would need to be supplied as part of the >>>> pipeline options? The cluster resource manager needs to allocate resources >>>> for both, the runner and the SDK harness(es). >>>> >>>> Finally, what can be done to unblock the Flink runner / Python until >>>> solution discussed here is in place? An extra runner option for SDK >>>> singleton on/off? >>>> >>>> >>>> On Sat, Aug 18, 2018 at 1:34 AM Ankur Goenka <goe...@google.com> wrote: >>>> >>>>> Sounds good to me. >>>>> GRPC Header of the control channel seems to be a good place to add >>>>> upper bound information. >>>>> Added jiras: >>>>> https://issues.apache.org/jira/browse/BEAM-5166 >>>>> https://issues.apache.org/jira/browse/BEAM-5167 >>>>> >>>>> On Fri, Aug 17, 2018 at 10:51 PM Henning Rohde <hero...@google.com> >>>>> wrote: >>>>> >>>>>> Regarding resources: the runner can currently dictate the >>>>>> mem/cpu/disk resources that the harness is allowed to use via the >>>>>> provisioning api. The SDK harness need not -- and should not -- speculate >>>>>> on what else might be running on the machine: >>>>>> >>>>>> >>>>>> https://github.com/apache/beam/blob/0e14965707b5d48a3de7fa69f09d88ef0aa48c09/model/fn-execution/src/main/proto/beam_provision_api.proto#L69 >>>>>> >>>>>> A realistic startup-time computation in the SDK harness would be >>>>>> something simple like: max(1, min(cpu*100, mem_mb/10)) say, and use that >>>>>> at >>>>>> most number of threads. Or just hardcode to 300. Or a user-provided >>>>>> value. >>>>>> Whatever the value is the maximum number of bundles in flight allowed at >>>>>> any given time and needs to be communicated to the runner via some >>>>>> message. >>>>>> Anything beyond would be rejected (but this shouldn't happen, because the >>>>>> runner should respect that number). >>>>>> >>>>>> A dynamic computation would use the same limits from the SDK, but >>>>>> take into account its own resource usage (incl. the usage by running >>>>>> bundles). >>>>>> >>>>>> On Fri, Aug 17, 2018 at 6:20 PM Ankur Goenka <goe...@google.com> >>>>>> wrote: >>>>>> >>>>>>> I am thinking upper bound to be more on the lines of theocratical >>>>>>> upper limit or any other static high value beyond which the SDK will >>>>>>> reject >>>>>>> bundle verbosely. The idea is that SDK will not keep bundles in queue >>>>>>> while >>>>>>> waiting on current bundles to finish. It will simply reject any >>>>>>> additional >>>>>>> bundle. >>>>>>> Beyond this I don't have a good answer to dynamic upper bound. As >>>>>>> SDK does not have the complete picture of processes on the machine with >>>>>>> which it share resources, resources might not be a good proxy for upper >>>>>>> bound from the SDK point of view. >>>>>>> >>>>>>> On Fri, Aug 17, 2018 at 6:01 PM Lukasz Cwik <lc...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Ankur, how would you expect an SDK to compute a realistic upper >>>>>>>> bound (upfront or during pipeline computation)? >>>>>>>> >>>>>>>> First thought that came to my mind was that the SDK would provide >>>>>>>> CPU/memory/... resourcing information and the runner making a judgement >>>>>>>> call as to whether it should ask the SDK to do more work or less but >>>>>>>> its >>>>>>>> not an explicit don't do more then X bundles in parallel. >>>>>>>> >>>>>>>> On Fri, Aug 17, 2018 at 5:55 PM Ankur Goenka <goe...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Makes sense. Having exposed upper bound on concurrency with >>>>>>>>> optimum concurrency can give a good balance. This is good information >>>>>>>>> to >>>>>>>>> expose while keeping the requirements from the SDK simple. SDK can >>>>>>>>> publish >>>>>>>>> 1 as the optimum concurrency and upper bound to keep things simple. >>>>>>>>> >>>>>>>>> Runner introspection of upper bound on concurrency is important >>>>>>>>> for correctness while introspection of optimum concurrency is >>>>>>>>> important for >>>>>>>>> efficiency. This separates efficiency and correctness requirements. >>>>>>>>> >>>>>>>>> On Fri, Aug 17, 2018 at 5:05 PM Henning Rohde <hero...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I agree with Luke's observation, with the caveat that "infinite >>>>>>>>>> amount of bundles in parallel" is limited by the available >>>>>>>>>> resources. For >>>>>>>>>> example, the Go SDK harness will accept an arbitrary amount of >>>>>>>>>> parallel >>>>>>>>>> work, but too much work will cause either excessive GC pressure with >>>>>>>>>> crippling slowness or an outright OOM. Unless it's always 1, a >>>>>>>>>> reasonable >>>>>>>>>> upper bound will either have to be provided by the user or computed >>>>>>>>>> from >>>>>>>>>> the mem/cpu resources given. Of course, as some bundles takes more >>>>>>>>>> resources than others, so any static value will be an estimate or >>>>>>>>>> ignore >>>>>>>>>> resource limits. >>>>>>>>>> >>>>>>>>>> That said, I do not like that an "efficiency" aspect becomes a >>>>>>>>>> subtle requirement for correctness due to Flink internals. I fear >>>>>>>>>> that road >>>>>>>>>> leads to trouble. >>>>>>>>>> >>>>>>>>>> On Fri, Aug 17, 2018 at 4:26 PM Ankur Goenka <goe...@google.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> The later case of having a of supporting single bundle execution >>>>>>>>>>> at a time on SDK and runner not using this flag is exactly the >>>>>>>>>>> reason we >>>>>>>>>>> got into the Dead Lock here. >>>>>>>>>>> I agree with exposing SDK optimum concurrency level ( 1 in later >>>>>>>>>>> case ) and let runner decide to use it or not. But at the same time >>>>>>>>>>> expect >>>>>>>>>>> SDK to handle infinite amount of bundles even if its not efficient. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Ankur >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 17, 2018 at 4:11 PM Lukasz Cwik <lc...@google.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I believe in practice SDK harnesses will fall into one of two >>>>>>>>>>>> capabilities, can process effectively an infinite amount of >>>>>>>>>>>> bundles in >>>>>>>>>>>> parallel or can only process a single bundle at a time. >>>>>>>>>>>> >>>>>>>>>>>> I believe it is more difficult for a runner to handle the >>>>>>>>>>>> latter case well and to perform all the environment management >>>>>>>>>>>> that would >>>>>>>>>>>> make that efficient. It may be inefficient for an SDK but I do >>>>>>>>>>>> believe it >>>>>>>>>>>> should be able to say that I'm not great at anything more then a >>>>>>>>>>>> single >>>>>>>>>>>> bundle at a time but utilizing this information by a runner should >>>>>>>>>>>> be >>>>>>>>>>>> optional. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Aug 17, 2018 at 1:53 PM Ankur Goenka <goe...@google.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> To recap the discussion it seems that we have come-up with >>>>>>>>>>>>> following point. >>>>>>>>>>>>> SDKHarness Management and initialization. >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Runner completely own the work assignment to SDKHarness. >>>>>>>>>>>>> 2. Runner should know the capabilities and capacity of >>>>>>>>>>>>> SDKHarness and should assign work accordingly. >>>>>>>>>>>>> 3. Spinning up of SDKHarness is runner's responsibility >>>>>>>>>>>>> and it can be done statically (a fixed pre configured number >>>>>>>>>>>>> of SDKHarness) >>>>>>>>>>>>> or dynamically or based on certain other configuration/logic >>>>>>>>>>>>> which runner >>>>>>>>>>>>> choose. >>>>>>>>>>>>> >>>>>>>>>>>>> SDKHarness Expectation. This is more in question and we should >>>>>>>>>>>>> outline the responsibility of SDKHarness. >>>>>>>>>>>>> >>>>>>>>>>>>> 1. SDKHarness should publish how many concurrent tasks it >>>>>>>>>>>>> can execute. >>>>>>>>>>>>> 2. SDKHarness should start executing all the tasks items >>>>>>>>>>>>> assigned in parallel in a timely manner or fail task. >>>>>>>>>>>>> >>>>>>>>>>>>> Also to add to simplification side. I think for better >>>>>>>>>>>>> adoption, we should have simple SDKHarness as well as simple >>>>>>>>>>>>> Runner >>>>>>>>>>>>> integration to encourage integration with more runner. Also many >>>>>>>>>>>>> runners >>>>>>>>>>>>> might not expose some of the internal scheduling characteristics >>>>>>>>>>>>> so we >>>>>>>>>>>>> should not expect scheduling characteristics for runner >>>>>>>>>>>>> integration. >>>>>>>>>>>>> Moreover scheduling characteristics can change based on pipeline >>>>>>>>>>>>> type, >>>>>>>>>>>>> infrastructure, available resource etc. So I am a bit hesitant to >>>>>>>>>>>>> add >>>>>>>>>>>>> runner scheduling specifics for runner integration. >>>>>>>>>>>>> A good balance between SDKHarness complexity and Runner >>>>>>>>>>>>> integration can be helpful in easier adoption. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Ankur >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Aug 17, 2018 at 12:22 PM Henning Rohde < >>>>>>>>>>>>> hero...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Finding a good balance is indeed the art of portability, >>>>>>>>>>>>>> because the range of capability (and assumptions) on both sides >>>>>>>>>>>>>> is wide. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It was originally the idea to allow the SDK harness to be an >>>>>>>>>>>>>> extremely simple bundle executer (specifically, single-threaded >>>>>>>>>>>>>> execution >>>>>>>>>>>>>> one instruction at a time) however inefficient -- a more >>>>>>>>>>>>>> sophisticated SDK >>>>>>>>>>>>>> harness would support more features and be more efficient. For >>>>>>>>>>>>>> the issue >>>>>>>>>>>>>> described here, it seems problematic to me to send >>>>>>>>>>>>>> non-executable bundles >>>>>>>>>>>>>> to the SDK harness under the expectation that the SDK harness >>>>>>>>>>>>>> will >>>>>>>>>>>>>> concurrently work its way deeply enough down the instruction >>>>>>>>>>>>>> queue to >>>>>>>>>>>>>> unblock itself. That would be an extremely subtle requirement >>>>>>>>>>>>>> for SDK >>>>>>>>>>>>>> authors and one practical question becomes: what should an >>>>>>>>>>>>>> SDK do with a bundle instruction that it doesn't have capacity >>>>>>>>>>>>>> to execute? If >>>>>>>>>>>>>> a runner needs to make such assumptions, I think that >>>>>>>>>>>>>> information should >>>>>>>>>>>>>> probably rather be explicit along the lines of proposal 1 -- >>>>>>>>>>>>>> i.e., some >>>>>>>>>>>>>> kind of negotiation between resources allotted to the SDK >>>>>>>>>>>>>> harness (a >>>>>>>>>>>>>> preliminary variant are in the provisioning api) and what the >>>>>>>>>>>>>> SDK harness >>>>>>>>>>>>>> in return can do (and a valid answer might be: 1 bundle at a time >>>>>>>>>>>>>> irrespectively of resources given) or a per-bundle special >>>>>>>>>>>>>> "overloaded" >>>>>>>>>>>>>> error response. For other aspects, such as side input readiness, >>>>>>>>>>>>>> the runner >>>>>>>>>>>>>> handles that complexity and the overall bias has generally been >>>>>>>>>>>>>> to move >>>>>>>>>>>>>> complexity to the runner side. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The SDK harness and initialization overhead is entirely SDK, >>>>>>>>>>>>>> job type and even pipeline specific. A docker container is also >>>>>>>>>>>>>> just a >>>>>>>>>>>>>> process, btw, and doesn't inherently carry much overhead. That >>>>>>>>>>>>>> said, on a >>>>>>>>>>>>>> single host, a static docker configuration is generally a lot >>>>>>>>>>>>>> simpler to >>>>>>>>>>>>>> work with. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Henning >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Aug 17, 2018 at 10:18 AM Thomas Weise <t...@apache.org> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> It is good to see this discussed! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think there needs to be a good balance between the SDK >>>>>>>>>>>>>>> harness capabilities/complexity and responsibilities. >>>>>>>>>>>>>>> Additionally the user >>>>>>>>>>>>>>> will need to be able to adjust the runner behavior, since the >>>>>>>>>>>>>>> type of >>>>>>>>>>>>>>> workload executed in the harness also is a factor. Elsewhere we >>>>>>>>>>>>>>> already >>>>>>>>>>>>>>> discussed that the current assumption of a single SDK harness >>>>>>>>>>>>>>> instance per >>>>>>>>>>>>>>> Flink task manager brings problems with it and that there needs >>>>>>>>>>>>>>> to be more >>>>>>>>>>>>>>> than one way how the runner can spin up SDK harnesses. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There was the concern that instantiation if multiple SDK >>>>>>>>>>>>>>> harnesses per TM host is expensive (resource usage, >>>>>>>>>>>>>>> initialization time >>>>>>>>>>>>>>> etc.). That may hold true for a specific scenario, such as >>>>>>>>>>>>>>> batch workloads >>>>>>>>>>>>>>> and the use of Docker containers. But it may look totally >>>>>>>>>>>>>>> different for a >>>>>>>>>>>>>>> streaming topology or when SDK harness is just a process on the >>>>>>>>>>>>>>> same host. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Thomas >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Aug 17, 2018 at 8:36 AM Lukasz Cwik < >>>>>>>>>>>>>>> lc...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> SDK harnesses were always responsible for executing all >>>>>>>>>>>>>>>> work given to it concurrently. Runners have been responsible >>>>>>>>>>>>>>>> for choosing >>>>>>>>>>>>>>>> how much work to give to the SDK harness in such a way that >>>>>>>>>>>>>>>> best utilizes >>>>>>>>>>>>>>>> the SDK harness. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I understand that multithreading in python is inefficient >>>>>>>>>>>>>>>> due to the global interpreter lock, it would be upto the >>>>>>>>>>>>>>>> runner in this >>>>>>>>>>>>>>>> case to make sure that the amount of work it gives to each SDK >>>>>>>>>>>>>>>> harness best >>>>>>>>>>>>>>>> utilizes it while spinning up an appropriate number of SDK >>>>>>>>>>>>>>>> harnesses. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Aug 17, 2018 at 7:32 AM Maximilian Michels < >>>>>>>>>>>>>>>> m...@apache.org> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Ankur, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for looking into this problem. The cause seems to >>>>>>>>>>>>>>>>> be Flink's >>>>>>>>>>>>>>>>> pipelined execution mode. It runs multiple tasks in one >>>>>>>>>>>>>>>>> task slot and >>>>>>>>>>>>>>>>> produces a deadlock when the pipelined operators schedule >>>>>>>>>>>>>>>>> the SDK >>>>>>>>>>>>>>>>> harness DoFns in non-topological order. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The problem would be resolved if we scheduled the tasks in >>>>>>>>>>>>>>>>> topological >>>>>>>>>>>>>>>>> order. Doing that is not easy because they run in separate >>>>>>>>>>>>>>>>> Flink >>>>>>>>>>>>>>>>> operators and the SDK Harness would have to have insight >>>>>>>>>>>>>>>>> into the >>>>>>>>>>>>>>>>> execution graph (which is not desirable). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The easiest method, which you proposed in 1) is to ensure >>>>>>>>>>>>>>>>> that the >>>>>>>>>>>>>>>>> number of threads in the SDK harness matches the number of >>>>>>>>>>>>>>>>> ExecutableStage DoFn operators. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The approach in 2) is what Flink does as well. It glues >>>>>>>>>>>>>>>>> together >>>>>>>>>>>>>>>>> horizontal parts of the execution graph, also in multiple >>>>>>>>>>>>>>>>> threads. So I >>>>>>>>>>>>>>>>> agree with your proposed solution. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Max >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 17.08.18 03:10, Ankur Goenka wrote: >>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > tl;dr Dead Lock in task execution caused by limited task >>>>>>>>>>>>>>>>> parallelism on >>>>>>>>>>>>>>>>> > SDKHarness. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > *Setup:* >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > * Job type: /*Beam Portable Python Batch*/ Job on >>>>>>>>>>>>>>>>> Flink standalone >>>>>>>>>>>>>>>>> > cluster. >>>>>>>>>>>>>>>>> > * Only a single job is scheduled on the cluster. >>>>>>>>>>>>>>>>> > * Everything is running on a single machine with >>>>>>>>>>>>>>>>> single Flink task >>>>>>>>>>>>>>>>> > manager. >>>>>>>>>>>>>>>>> > * Flink Task Manager Slots is 1. >>>>>>>>>>>>>>>>> > * Flink Parallelism is 1. >>>>>>>>>>>>>>>>> > * Python SDKHarness has 1 thread. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > *Example pipeline:* >>>>>>>>>>>>>>>>> > Read -> MapA -> GroupBy -> MapB -> WriteToSink >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > *Issue:* >>>>>>>>>>>>>>>>> > With multi stage job, Flink schedule different dependent >>>>>>>>>>>>>>>>> sub tasks >>>>>>>>>>>>>>>>> > concurrently on Flink worker as long as it can get >>>>>>>>>>>>>>>>> slots. Each map tasks >>>>>>>>>>>>>>>>> > are then executed on SDKHarness. >>>>>>>>>>>>>>>>> > Its possible that MapB gets to SDKHarness before MapA >>>>>>>>>>>>>>>>> and hence gets >>>>>>>>>>>>>>>>> > into the execution queue before MapA. Because we only >>>>>>>>>>>>>>>>> have 1 execution >>>>>>>>>>>>>>>>> > thread on SDKHarness, MapA will never get a chance to >>>>>>>>>>>>>>>>> execute as MapB >>>>>>>>>>>>>>>>> > will never release the execution thread. MapB will wait >>>>>>>>>>>>>>>>> for input from >>>>>>>>>>>>>>>>> > MapA. This gets us to a dead lock in a simple pipeline. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > *Mitigation:* >>>>>>>>>>>>>>>>> > Set worker_count in pipeline options more than the >>>>>>>>>>>>>>>>> expected sub tasks >>>>>>>>>>>>>>>>> > in pipeline. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > *Proposal:* >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > 1. We can get the maximum concurrency from the runner >>>>>>>>>>>>>>>>> and make sure >>>>>>>>>>>>>>>>> > that we have more threads than max concurrency. This >>>>>>>>>>>>>>>>> approach >>>>>>>>>>>>>>>>> > assumes that Beam has insight into runner execution >>>>>>>>>>>>>>>>> plan and can >>>>>>>>>>>>>>>>> > make decision based on it. >>>>>>>>>>>>>>>>> > 2. We dynamically create thread and cache them with a >>>>>>>>>>>>>>>>> high upper bound >>>>>>>>>>>>>>>>> > in SDKHarness. We can warn if we are hitting the >>>>>>>>>>>>>>>>> upper bound of >>>>>>>>>>>>>>>>> > threads. This approach assumes that runner does a >>>>>>>>>>>>>>>>> good job of >>>>>>>>>>>>>>>>> > scheduling and will distribute tasks more or less >>>>>>>>>>>>>>>>> evenly. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > We expect good scheduling from runners so I prefer >>>>>>>>>>>>>>>>> approach 2. It is >>>>>>>>>>>>>>>>> > simpler to implement and the implementation is not >>>>>>>>>>>>>>>>> runner specific. This >>>>>>>>>>>>>>>>> > approach better utilize resource as it creates only as >>>>>>>>>>>>>>>>> many threads as >>>>>>>>>>>>>>>>> > needed instead of the peak thread requirement. >>>>>>>>>>>>>>>>> > And last but not the least, it gives runner control over >>>>>>>>>>>>>>>>> managing truly >>>>>>>>>>>>>>>>> > active tasks. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Please let me know if I am missing something and your >>>>>>>>>>>>>>>>> thoughts on the >>>>>>>>>>>>>>>>> > approach. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>>>>> > Ankur >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>