[
https://issues.apache.org/jira/browse/SAMZA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568218#comment-14568218
]
Yan Fang commented on SAMZA-41:
-------------------------------
hi [~mdaxini], thank you for picking this up and attaching the design doc.
Just a little confused by what you are trying to achieve.
1. Is this change only for the LocalJobFactory or for all the Samza deployment?
2. Are you trying to *filter* out some stream partitions? such as you want to
have something like kafka.foo[1-4], which only consumes foo's partition 1 - 4.
If this is true, your proposal looks good. + 1 for that.
3. Are you trying to assign specific partition to a specific container? The
proposal does not look like this way. Feel confused because of the topic of
this JIRA.
> Support static partition assignment in LocalJobFactory
> ------------------------------------------------------
>
> Key: SAMZA-41
> URL: https://issues.apache.org/jira/browse/SAMZA-41
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.6.0
> Reporter: Chris Riccomini
> Labels: project
> Attachments: samza-41-design-doc.md, samza-41-design-doc.pdf
>
>
> LocalJobFactory currently creates a single container (either in ProcessJob or
> ThreadJob) and assigns all partitions to it using:
> {code}
> val partitions = Util.getMaxInputStreamPartitions(config)
> {code}
> This works in the case where you only wish to run a single container that
> processes all messages. There are situations where one container is not
> enough, though. If you aren't using YARN, we don't provide an easy way to run
> multiple containers that split partitions between them. This support would be
> useful for running containers in EC2, for example, where you'd wish to run
> two EC2 instances (for example) that host Samza containers that share
> partitions for a single job.
> Some potential solutions:
> 1. Let developers statically assign partitions in config file.
> 2. Let developers define a container ID and container count, and let
> LocalJobFactory/ProcessJob/ThreadJob figure out which partitions the
> container should own. For example, a container with id 0 and container count
> 2 would own partitions 0, 2, 4, 6, 8, etc.
> 3. Write a different JobFactory for this case (e.g. EC2JobFactory)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)