[ 
https://issues.apache.org/jira/browse/SAMZA-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638667#comment-14638667
 ] 

József Márton Jung commented on SAMZA-717:
------------------------------------------

Thanks guys for the comments. {{TaskGrouper}} was renamed back to 
{{TaskNameGrouper}}. The code is moved back to {{samza-core}}. 
[^SAMZA-717.1.patch] contains these modifications. RB has been updated as well.

[~nickpan47], maybe [~closeuris] has some use-cases where a custom 
task-to-container assignment is needed. To be honest, I just implemented the 
ticket, but I don't have any concrete use case where this is needed. :-)

>From the user point of view, here is an example on using a customized 
>{{TaskNameGrouper}}:
* Two interfaces should be implemented, {{TaskNameGrouper}} and 
{{TaskNameGrouperFactory}}
{code:java}
public class MyTaskNameGrouper implements TaskNameGrouper {

    public MyTaskNameGrouper(int property1, String property2) {
    }

    @Override
    public Set<ContainerModel> group(Set<TaskModel> taskModels) {
        // grouping implementation
    }
}
{code}
{code:java}
public class MyTaskNameGrouperFactory implements TaskNameGrouperFactory {
    @Override
    public TaskNameGrouper build(Config config) {
        return new 
MyTaskNameGrouper(config.getInt("task.name.grouper.property1"), 
config.getString("task.name.grouper.property2"));
    }
}
{code}
* In the properties file:
{noformat}
# fully qualified name of the class
task.name.grouper.factory=com.levi9.samza.example.twitter.task.grouper.MyTaskNameGrouperFactory
# custom properties which should be specified by the user - these values can be 
read from Config in the factory class
task.name.grouper.property1=1
task.name.grouper.property2=example
{noformat}

> Expose the TaskNameGrouper API
> ------------------------------
>
>                 Key: SAMZA-717
>                 URL: https://issues.apache.org/jira/browse/SAMZA-717
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Yan Fang
>            Assignee: József Márton Jung
>            Priority: Minor
>             Fix For: 0.10.0
>
>         Attachments: SAMZA-717.0.patch, SAMZA-717.1.patch
>
>
> We now are using the 
> [GroupByContainerCount|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/GroupByContainerCount.scala]
>  that extends 
> [TaskNameGrouper|https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/grouper/task/TaskNameGrouper.scala]
>  to assign TaskModels to ContainerModels (equivalent to assign tasks to 
> different containers in YARN world).
> I think it also makes sense that we expose the TaskNameGrouper as an API that 
> users can use to implement how they want to assign the TaskModels to the 
> ContainerModels. 
> This is useful when users have knowledge about the throughput of their 
> streams because we are sharing the consumers for all the taskIntances in one 
> container. One use case is that users want to put (partition-1, partition-3), 
> (partition-2, partition-4) instead of (partition-1, partition-2), 
> (partition-3, partition-4), which is current strategy. Because partition-1 
> and partition-2 both have a lot of messages coming, while partition-3 and 
> partition-4 have fewer messages coming. Of course, when users have enough 
> containers (same number as the task number) or all the partitions are equally 
> divided, this feature is useless.
> What do you guys think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to