[
https://issues.apache.org/jira/browse/SAMZA-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981705#comment-13981705
]
Chris Riccomini commented on SAMZA-123:
---------------------------------------
bq. so if we stuff the task-state-partition-mapping into it, we're redefining
the interface pretty dramatically and it's no longer really a CheckpointManager
I agree. See my comment in (6) above. Perhaps we should rethink how we treat
config, checkpoints, and SSP/cohort mappings? I think the main question I have
is: if we had a generalized map topic where we keep config and cohorts, do we
still need KAFKA-1000? Why split the "job state" between two places? Why not
just put everything into one place (the topic)?
bq. I don't think it was the right choice with MessageChooser, or here. First,
you're only a young Incubator project once and it's better to experiment and be
open as flexible while we can before a larger user base forces more
conservative choices.
We disagree. This project is being used in production. I want to avoid
backwards incompatibility as much as possible. The choice we made with
MessageChooser saved us from having to re-write several jobs. The
backwards-compatibility issue always seems very small when you make the change,
and very big in a year's time.
That said, I'm proposing in SAMZA-250 to change the API of SystemConsumer,
which is backwards incompatible. The main difference between the two is that
introducing the backwards compatibility in SAMZA-250 gets us a direct benefit
(improved performance). Introducing the risk for incompatibility here seems
unnecessary, since the initial feature set covers a huge swath of the use
cases, no one is actively asking for it (except us), and we can always open it
up later by simply adding a config.
bq. Second, nobody thus far has said this feature shouldn't be pluggable, just
that it maybe shouldn't be pluggable yet.
Yes, that's what I'm saying. Don't make it pluggable yet. Let's let it bake
before we open it up.
bq. I'm not married to the word cohort, but I am pretty strong about not
overloading another term already used.
The most unanimous feedback on this JIRA is that no one likes cohort. I like
taskName because that's what the string is. The fact that the task name is
attached to a set of SSPs is just a very small part of what the name will be
used for. It'll show up in all metrics, show up in logs, file directories, etc.
Calling the thing a cohort in those scenarios makes no sense. The task is much
more than a group of things banded together. It has state, offsets, logic, etc.
Identifying it at a cohort doesn't make sense to me.
bq. OK, how about CircumstanceLog? PredicamentLog? FootingFile?
SetupLog? It's essentially a list of changes required to setup a job before it
starts, right? Not sure how I feel about the name, just spit-balling. :P
> Move topic partition grouping to the AM and generalize
> ------------------------------------------------------
>
> Key: SAMZA-123
> URL: https://issues.apache.org/jira/browse/SAMZA-123
> Project: Samza
> Issue Type: Sub-task
> Components: container
> Affects Versions: 0.6.0
> Reporter: Jakob Homan
> Assignee: Jakob Homan
> Attachments: SAMZA-123-design-doc.md, SAMZA-123-design-doc.pdf
>
>
> Currently the AM sends a set of all the topics and partitions to the
> container, which then groups them by partition and assigns each set to a task
> instance. By moving the grouping to the AM, we can assign arbitrary groups to
> task instances, which will allow more partitioning strategies, as discussed
> in SAMZA-71.
--
This message was sent by Atlassian JIRA
(v6.2#6252)