GitHub user shanthoosh opened a pull request:

    https://github.com/apache/samza/pull/803

    SAMZA-1989: SystemStreamGrouper interface change for SEP-5

    
    Samza users may need to increase the partition count of the input streams 
of their stateful samza jobs. For example, Kafka needs to limit the maximum 
size of each partition to scale up its performance. Thus the number of 
partitions of a Kafka topic needs to be expanded to reduce the partition size 
if the average byte-in-rate or retention time of the Kafka topic has doubled.
    
    In order to perform a join between streams, stateful jobs generally route 
the partitions from the different input streams to same task of a container. 
However, when a input stream repartitioning happens, key space of a partition 
gets redistributed. This will make the stateful jobs to produce erroneous 
results.
    
    So if the partition count of input stream is increased then the users have 
to manually purge the changelog topics, local RocksDb state of their stateful 
jobs. This  results in an increased operational complexity and data loss.
    
    This patch takes a first stab at solving the above problem and is comprised 
of the following changes:
    
    * Introduce a new group method in `SystemStreamPartitionGrouper` interface 
to generate task assignment factoring in the partition expansion of input 
streams.
    * Introduced a `StreamPartitionMapper` abstraction to allow the user to 
plugin the input stream partitioning function.  
    * Fixed the existing unit tests and added new unit tests to validate the 
new grouper changes.
    
    In a followup PR shortly, these grouper changes would be integrated with 
`JobModelManager`(Waiting for PR 790 to be landed for this. It had made 
significant changes to `JobModelManager`)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shanthoosh/samza SEP-5

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/samza/pull/803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #803
    
----
commit 7f35f895959ac6d4dd0c14a99087ae226bb9bc90
Author: Shanthoosh Venkataraman <spvenkat@...>
Date:   2018-11-07T20:40:25Z

    Initial version of SEP-5.

----


---

Reply via email to