[ 
https://issues.apache.org/jira/browse/SAMZA-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047230#comment-15047230
 ] 

Jake Maes commented on SAMZA-833:
---------------------------------

Hey [~TaoFeng], thanks for the suggestion, but if I understand your quick fix, 
it will impact all factories other than YarnJobFactory. This would cause 
unexpected behavior if someone were to implement a factory for another cluster 
manager, e.g. MesosJobFactory. So my intuition is that we should try to scope 
this fix to ProcessJobFactory. 

If you look at the version history of ProcessJobFactory, you can see that in 
SAMZA-465 we stopped passing 1 to the JobCoordinator to force the container 
count to 1, but we filtered the config so yarn.container.count was stripped 
from the config. Then in SAMZA-805 we removed the config filtering, which is 
why ProcessJob can now think it has more than 1 container. 

The quick fix is probably to filter out the 
job.container.count/yarn.container.count configs in ProcessJobFactory, but that 
assumes (as we have thus far) that ProcessJob should only have 1 container. So, 
an alternative, more complicated fix, would be to support multiple containers 
for ProcessJob. The quick fix is probably better for now.

> ProcessJob mishandling containers
> ---------------------------------
>
>                 Key: SAMZA-833
>                 URL: https://issues.apache.org/jira/browse/SAMZA-833
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Jake Maes
>
> As a result of SAMZA-465 and SAMZA-805, ProcessJobFactory now passes the full 
> config to the ProcessJob and no longer forces the container count to 1. This 
> causes the ProcessJob to actually read the container count config and if it 
> is not 1, it produces some unexpected behavior. 
> Specifically we've had reports of ProcessJobs dropping messages because the 
> container count is > 1, so the grouper assigns partitions to more than 1 
> container, but only one container actually runs. 
> The goal of this ticket is to either force the container count to 1 for 
> ProcessJob, or fix how multiple containers run with ProcessJob. But we should 
> not allow the scenario where messages are dropped. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to