GitHub user shanthoosh opened a pull request:
https://github.com/apache/samza/pull/874
SAMZA-2058: Integrate the input partition expansion aware
SystemStreamGrouper to JobModel generation flow.
SAMZA-1989 added a partition expansion aware SystemStreamPartitionGrouper
in samza. This PR aims at integrating the SystemStreamGrouper with the job
model generation workflow of samza
and make it work for both the yarn and standalone deployment models.
**Changes:**
1. Addition of TaskPartitionAssignmentManager to store the task to
partition assignments present in JobModel to the underlying metadata store.
This is essential in persisting the Task to SystemStreamPartition assignments
for the previous run of a samza job. Currently samza-yarn stores the metadata
for a execution of a job in coordinator stream. Maximum supported kafka message
size within LI is 1 MB. This limitation drove the decision to denormalize the
task to SystemStreamPartition Map into individual messages and store in the
coordinator stream.
2. Used the existing Coordinator stream json serde to deserialize/serialize
the task to partition assigments to raw bytes before reading/writing into
coordinator stream.
3. Changes in JobModelManager to integrate the input partition expansion
aware SSPGrouper changes.
4. Code/JavaDoc cleanup done in MetadataStore utility classes.
**Testing**:
1. Added new unit-tests for all the newly added classes and fixed the
existing unit-tests depending upon the changes.
2. Standalone: Wrote few integration tests in TestZkLocalApplcationRunner
for standalone to test input stream partition expansion.
3. YARN: Tested this patch with a sample stream-to-table join high-level
job from samza-hello-samza. Here're the relevant logs:
https://gist.github.com/shanthoosh/07357bb615d9cbbfa23cc02b98c9d142, which
verifies that the AM is restarted on partition expansion of input stream and
correct task to partition assignments are generated.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/shanthoosh/samza SEP-5_left-over
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/samza/pull/874.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #874
----
commit 7bedd46ba98ebb18bdcbf6e3feace7188ac9af20
Author: Shanthoosh Venkataraman <spvenkat@...>
Date: 2018-12-07T02:04:18Z
Initial commit.
----
---