[
https://issues.apache.org/jira/browse/SAMZA-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14637311#comment-14637311
]
Yi Pan (Data Infrastructure) commented on SAMZA-717:
----------------------------------------------------
[~jjung], sorry to chime in late. I felt that this feature may be obsoleted
soon w/ new standalone model w/ a coordinated partition manager. Pretty much,
this would be part of the function of the partition manager, when it decides
where to place a certain group of partitions to a standalone Samza container.
Hence, here are my opinions:
# Is there an immediate common use case that requires a customized
task-to-container assignment policy? If not, we may want to hold on that and
move this function to the new design of partition manager
# Even if we need it now for some use cases, , I would agree w/ [~closeuris]
that we should not expose it as the official API in samza-api, just to avoid
have to deprecate an API later.
Thanks!
-Yi
> Expose the TaskNameGrouper API
> ------------------------------
>
> Key: SAMZA-717
> URL: https://issues.apache.org/jira/browse/SAMZA-717
> Project: Samza
> Issue Type: New Feature
> Reporter: Yan Fang
> Assignee: József Márton Jung
> Priority: Minor
> Fix For: 0.10.0
>
> Attachments: SAMZA-717.0.patch
>
>
> We now are using the
> [GroupByContainerCount|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/GroupByContainerCount.scala]
> that extends
> [TaskNameGrouper|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/TaskNameGrouper.scala]
> to assign TaskModels to ContainerModels (equivalent to assign tasks to
> different containers in YARN world).
> I think it also makes sense that we expose the TaskNameGrouper as an API that
> users can use to implement how they want to assign the TaskModels to the
> ContainerModels.
> This is useful when users have knowledge about the throughput of their
> streams because we are sharing the consumers for all the taskIntances in one
> container. One use case is that users want to put (partition-1, partition-3),
> (partition-2, partition-4) instead of (partition-1, partition-2),
> (partition-3, partition-4), which is current strategy. Because partition-1
> and partition-2 both have a lot of messages coming, while partition-3 and
> partition-4 have fewer messages coming. Of course, when users have enough
> containers (same number as the task number) or all the partitions are equally
> divided, this feature is useless.
> What do you guys think?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)