[jira] [Commented] (SAMZA-1282) Spinning up more containers than the number of tasks kills leader

Shanthoosh Venkataraman (JIRA) Mon, 10 Jul 2017 15:27:14 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081282#comment-16081282
 ]


Shanthoosh Venkataraman commented on SAMZA-1282:
------------------------------------------------

Scenario : When there’re more stream processors(P) than tasks(T) [ X is number 
of stream processors, Y is number of tasks. X > Y].

Current behavior : Fail with RuntimeException.

Possible solutions: 
Solution A:

Sort the stream processors using unique zookeeper sequential id associated with 
each processor. Generate job model using ‘Y’ lexicographically least stream 
processors and kill the rest of stream processors. 

Pros: 
* Straight forward and doesn’t require much change.

Cons: 
* Additional stream processors are killed instead of using them when there're 
death to existing members of processors group.

Solution B:

Sort the stream processors using unique zookeeper sequential id associated with 
each processor.  Generate job model using ‘Y’ lexicographically least stream 
processors and allow additional stream processors to live (could join group 
when any chosen stream processor dies). Will require each stream processor to 
hold local state (if it’s part of a group or not) and ignore zookeeper events 
if not part of the group. 

Pros:
* Improved fault tolerance to stream processor deaths in a group.

Cons: 
* Expected obvious performance drop since standby processors consume system 
resources and receive zookeeper events.

> Spinning up more containers than the number of tasks kills leader
> -----------------------------------------------------------------
>
>                 Key: SAMZA-1282
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1282
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.13.0
>            Reporter: Bharath Kumarasubramanian
>            Assignee: Shanthoosh Venkataraman
>             Fix For: 0.13.1
>
>
> When a user tries to spin up more containers than the max partitions or 
> tasks, the leader process gets killed. 
> We throw an exception in the TaskNameGrouper for the above scenario and that 
> needs to be handled gracefully by the leader and kill the newly spun 
> containers as opposed bailing out.
> Here is the stack trace 
> {code}
>  2017-05-10 15:13:24.526 [debounce-thread-0] ScheduleAfterDebounceTime 
> [ERROR] OnProcessorChange threw an exception.
> java.lang.IllegalArgumentException: number of containers 2 is bigger than 
> number of tasks 1
>       at 
> org.apache.samza.container.grouper.task.GroupByContainerIds.group(GroupByContainerIds.java:68)
>       at 
> org.apache.samza.coordinator.JobModelManager$.readJobModel(JobModelManager.scala:258)
>       at 
> org.apache.samza.coordinator.JobModelManager.readJobModel(JobModelManager.scala)
>       at 
> org.apache.samza.zk.ZkJobCoordinator.generateNewJobModel(ZkJobCoordinator.java:212)
>       at 
> org.apache.samza.zk.ZkJobCoordinator.doOnProcessorChange(ZkJobCoordinator.java:125)
>       at 
> org.apache.samza.zk.ZkJobCoordinator.lambda$onProcessorChange$1(ZkJobCoordinator.java:120)
>       at 
> org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$scheduleAfterDebounceTime$0(ScheduleAfterDebounceTime.java:89)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (SAMZA-1282) Spinning up more containers than the number of tasks kills leader

Reply via email to