[
https://issues.apache.org/jira/browse/SAMZA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568748#comment-14568748
]
Monal Daxini commented on SAMZA-41:
-----------------------------------
The proposal is to only enable the parition assignment using the regex if
ProcessJobFactory or ThreadJobFactory is specified. This would be in line with
the scope of this ticket to support static partition assignment for
LocalJobFactory.
The part that the proposal leaves for later is the ability to run more than 1
container for ProcessJobFactory and ThreadJobFactory as monitoring and managing
additional containers is more involved. Besides multiple container management
is contemplated to be added to the JobCoordinator in the near future.
> Support static partition assignment in LocalJobFactory
> ------------------------------------------------------
>
> Key: SAMZA-41
> URL: https://issues.apache.org/jira/browse/SAMZA-41
> Project: Samza
> Issue Type: Bug
> Components: container
> Affects Versions: 0.6.0
> Reporter: Chris Riccomini
> Labels: project
> Attachments: samza-41-design-doc.md, samza-41-design-doc.pdf
>
>
> LocalJobFactory currently creates a single container (either in ProcessJob or
> ThreadJob) and assigns all partitions to it using:
> {code}
> val partitions = Util.getMaxInputStreamPartitions(config)
> {code}
> This works in the case where you only wish to run a single container that
> processes all messages. There are situations where one container is not
> enough, though. If you aren't using YARN, we don't provide an easy way to run
> multiple containers that split partitions between them. This support would be
> useful for running containers in EC2, for example, where you'd wish to run
> two EC2 instances (for example) that host Samza containers that share
> partitions for a single job.
> Some potential solutions:
> 1. Let developers statically assign partitions in config file.
> 2. Let developers define a container ID and container count, and let
> LocalJobFactory/ProcessJob/ThreadJob figure out which partitions the
> container should own. For example, a container with id 0 and container count
> 2 would own partitions 0, 2, 4, 6, 8, etc.
> 3. Write a different JobFactory for this case (e.g. EC2JobFactory)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)