[ 
https://issues.apache.org/jira/browse/SAMZA-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981705#comment-13981705
 ] 

Chris Riccomini commented on SAMZA-123:
---------------------------------------

bq. so if we stuff the task-state-partition-mapping into it, we're redefining 
the interface pretty dramatically and it's no longer really a CheckpointManager

I agree. See my comment in (6) above. Perhaps we should rethink how we treat 
config, checkpoints, and SSP/cohort mappings? I think the main question I have 
is: if we had a generalized map topic where we keep config and cohorts, do we 
still need KAFKA-1000? Why split the "job state" between two places? Why not 
just put everything into one place (the topic)?

bq. I don't think it was the right choice with MessageChooser, or here. First, 
you're only a young Incubator project once and it's better to experiment and be 
open as flexible while we can before a larger user base forces more 
conservative choices.

We disagree. This project is being used in production. I want to avoid 
backwards incompatibility as much as possible. The choice we made with 
MessageChooser saved us from having to re-write several jobs. The 
backwards-compatibility issue always seems very small when you make the change, 
and very big in a year's time.

That said, I'm proposing in SAMZA-250 to change the API of SystemConsumer, 
which is backwards incompatible. The main difference between the two is that 
introducing the backwards compatibility in SAMZA-250 gets us a direct benefit 
(improved performance). Introducing the risk for incompatibility here seems 
unnecessary, since the initial feature set covers a huge swath of the use 
cases, no one is actively asking for it (except us), and we can always open it 
up later by simply adding a config.

bq. Second, nobody thus far has said this feature shouldn't be pluggable, just 
that it maybe shouldn't be pluggable yet.

Yes, that's what I'm saying. Don't make it pluggable yet. Let's let it bake 
before we open it up.

bq. I'm not married to the word cohort, but I am pretty strong about not 
overloading another term already used.

The most unanimous feedback on this JIRA is that no one likes cohort. I like 
taskName because that's what the string is. The fact that the task name is 
attached to a set of SSPs is just a very small part of what the name will be 
used for. It'll show up in all metrics, show up in logs, file directories, etc. 
Calling the thing a cohort in those scenarios makes no sense. The task is much 
more than a group of things banded together. It has state, offsets, logic, etc. 
Identifying it at a cohort doesn't make sense to me.

bq. OK, how about CircumstanceLog? PredicamentLog? FootingFile? 

SetupLog? It's essentially a list of changes required to setup a job before it 
starts, right? Not sure how I feel about the name, just spit-balling. :P

> Move topic partition grouping to the AM and generalize
> ------------------------------------------------------
>
>                 Key: SAMZA-123
>                 URL: https://issues.apache.org/jira/browse/SAMZA-123
>             Project: Samza
>          Issue Type: Sub-task
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: SAMZA-123-design-doc.md, SAMZA-123-design-doc.pdf
>
>
> Currently the AM sends a set of all the topics and partitions to the 
> container, which then groups them by partition and assigns each set to a task 
> instance. By moving the grouping to the AM, we can assign arbitrary groups to 
> task instances, which will allow more partitioning strategies, as discussed 
> in SAMZA-71.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to