[ 
https://issues.apache.org/jira/browse/SAMZA-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180488#comment-15180488
 ] 

Navina Ramesh commented on SAMZA-882:
-------------------------------------

To add to what Yi pointed out regarding the hash function, I think stream 
processor should be agnostic to partitioning semantics in the input stream. Or 
at the minimum, it should be able to detect partition count changes and 
rebalance. This JIRA is just a step towards rebalancing. We want to enable 
detecting this change in the first place. 

> Detect partition count changes in input streams
> -----------------------------------------------
>
>                 Key: SAMZA-882
>                 URL: https://issues.apache.org/jira/browse/SAMZA-882
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>            Reporter: Navina Ramesh
>            Assignee: Navina Ramesh
>             Fix For: 0.10.1
>
>         Attachments: SAMZA-882-0.patch
>
>
> This is a known issue where any change in the partition count in the upstream 
> affects the Samza job and it needs to be restarted. In such scenarios, we 
> experience data loss or incorrect processing because the application logic 
> depends on the partitioning strategy. It is worsened by the fact that we 
> don't even have a good mechanism to detect such a change. 
> As a first-step towards detection, I propose that we modify the stream 
> metadata cache maintained in Samza such that when there a change in partition 
> count, we increment a gauge metric. This way we can at least attach a hook to 
> monitor when this happens and take necessary actions. 
> However, in the long-term, we need to come up with a better strategy for 
> handling this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to