[ 
https://issues.apache.org/jira/browse/SAMZA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569732#comment-14569732
 ] 

Yi Pan (Data Infrastructure) commented on SAMZA-41:
---------------------------------------------------

[~mdaxini], the proposal looks good to me overall. I just have the following 
minor comments:

# Can we name it to SystemStreamPartitionMatcher? Per [~closeuris]'s comments, 
filter can easily led to the interpretation that this class is to filter *out* 
partitions. But it is more of *matching* the partitions to the local job. I 
would prefer to change job.systemstreampartition.filterClass to 
job.systemstreampartition.matcher.class to follow our convention in Samza 
config as well.
# I would suggest to change the configuration of the matcher to be: 
job.systemstreampartition.matcher.config.*, also following the convention that 
we have 
[http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html#regex-rewriter|here]

Thanks!

> Support static partition assignment in LocalJobFactory
> ------------------------------------------------------
>
>                 Key: SAMZA-41
>                 URL: https://issues.apache.org/jira/browse/SAMZA-41
>             Project: Samza
>          Issue Type: Bug
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Chris Riccomini
>              Labels: project
>         Attachments: samza-41-design-doc.md, samza-41-design-doc.pdf
>
>
> LocalJobFactory currently creates a single container (either in ProcessJob or 
> ThreadJob) and assigns all partitions to it using:
> {code}
> val partitions = Util.getMaxInputStreamPartitions(config)
> {code}
> This works in the case where you only wish to run a single container that 
> processes all messages. There are situations where one container is not 
> enough, though. If you aren't using YARN, we don't provide an easy way to run 
> multiple containers that split partitions between them. This support would be 
> useful for running containers in EC2, for example, where you'd wish to run 
> two EC2 instances (for example) that host Samza containers that share 
> partitions for a single job.
> Some potential solutions:
> 1. Let developers statically assign partitions in config file.
> 2. Let developers define a container ID and container count, and let 
> LocalJobFactory/ProcessJob/ThreadJob figure out which partitions the 
> container should own. For example, a container with id 0 and container count 
> 2 would own partitions 0, 2, 4, 6, 8, etc.
> 3. Write a different JobFactory for this case (e.g. EC2JobFactory)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to