[ 
https://issues.apache.org/jira/browse/SAMZA-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985678#comment-13985678
 ] 

Chris Riccomini commented on SAMZA-123:
---------------------------------------

bq. Guys, are you ok with answers provided above, except those still under 
discussion as described below?

Yep, I think so.

bq. GroupIntoNSets is confusing/not very useful as defined. Chris had a good 
idea to set the N there to be the same as the number of containers. This is 
likely its most common use case. This will move the code into some package 
that's aware of YARN since it requires a YARN-specific config (or we need to 
change the yarn container count config to be more generic, but that's out of 
scope here).

I still feel like the strategy is just a bit weird. As I understand it, the 
reason that it exists is just to reduce the number of TIs (vs. just having one 
TI per-partition, and spreading the TIs evenly amongst containers). TIs are 
cheap, though. The only issue we've seen (that I can recall) with having a lot 
of TIs is that this can cause an explosion of the number of metrics that are 
reported. In the case of the JmxReporter and MetricsSnapshotReporter, there 
isn't any problem with this. Some other systems can have problems with the 
number of metrics being reported, but I think there are better solutions to 
that problem than introducing a grouping strategy to reduce the number of TIs. 
Better solutions to this problem would be to filter (either with a debug-level, 
or a black list) the number of metrics, or just use metrics infrastructure that 
can handle 10k-100k metrics/job. I don't think we need this strategy.

bq. The bookkeeping necessary to support state and these new features should be 
kept away from the state log itself and moved into some central location with 
other per-job info (whatever form that takes)

Agree.

bq. I'm not wild about shard, as I think that's a pretty key term for databases 
and may lead people to think we this grouping functions the same way.

Agree.

bq. Task Name has a pretty big flaw it conflates these tasks with map-reduce 
tasks, when in fact our closest analog is the Samza container.

The issue I take with adding *any* new word is that we *already* have the word 
"task" deeply ingrained in Samza (StreamTask, TaskInstance, 
TaskLifecycleListener, TaskContext, TaskCoordinator, TaskStorageManager, etc). 
Some of these are public user-facing highly visible APIs. The fact is, we have 
a task, and it means a certain thing. It's documented. Talks have been given on 
it. Adding a *new* word to describe a task, while still keeping the word" task" 
everywhere else, just makes things worse, not better.

If we want to open up a separate discussion about refactoring everything to 
change the task stuff to something else, we can discuss that, but I see it as 
an orthogonal discussion to this issue. Simply adding "cohort" for one thing in 
this ticket, which diverges from the entire rest of the code-base, seems wrong 
to me. I remain pretty strongly convinced that taskName is in keeping with the 
rest of the code base as it exists today.

bq. Essentially, it's a chicken-egg thing - people won't use this feature and 
find its limitations/build cooler things with Samza, if it's not a feature we 
publicize and make available.

I can live with this.

bq. This approach seems like the smallest necessary change to implement the 
current JIRA and doesn't expose anything that can't be changed in the future, 
absent more discussion.

Broadly, I agree with you. I think the ConfigLog is totally something we should 
explore in a separate ticket to figure out what stuff should be added to it, 
how it would work, etc. My main concern here is about the sequencing of things. 
The smallest necessary change to implement the current JIRA is actually just to 
add the information to the existing checkpoint.

Here's my concern with adding the ConfigLog as part of this JIRA: we don't 
think the implications of the ConfigLog all the way through because we want to 
limit the scope of this JIRA and finish it faster (with fewer code changes/less 
risk). As a result, we now have two topics: ConfigLog and the checkpoint topic. 
Then, we open up a second ConfigLog JIRA, and discover some flaw with it, or 
decide we want a different design. Now we have two topics to migrate 
(checkpoint log, and config log), as well as two different chunks of code to 
update/delete, and three different versions of a job (one with just 
checkpoints, one with checkpoint and config log from this jira, and one with 
checkpoint and config log from new jira). This is complicated.

This concern leads me to want to just put the cohort/SSP and state mapping 
stuff into the existing Checkpoint class and use our existing 
CheckpointManager. As a follow on JIRA, we can do a nice detailed design doc on 
the ConfigLog, and think through things in more detail. Since we'll have to 
migrate the checkpoint no matter what, sticking the cohort/SSP stuff into the 
checkpoint shouldn't introduce any additional work, and not committing to a 
ConfigLog now will mean we'll feel more comfortable thinking openly about how 
exactly it should be implemented (rather than being tied to the implementation 
that falls out of this JIRA, which might or might not be the best approach when 
we dig into it in more detail).

> Move topic partition grouping to the AM and generalize
> ------------------------------------------------------
>
>                 Key: SAMZA-123
>                 URL: https://issues.apache.org/jira/browse/SAMZA-123
>             Project: Samza
>          Issue Type: Sub-task
>          Components: container
>    Affects Versions: 0.6.0
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: SAMZA-123-design-doc.md, SAMZA-123-design-doc.pdf
>
>
> Currently the AM sends a set of all the topics and partitions to the 
> container, which then groups them by partition and assigns each set to a task 
> instance. By moving the grouping to the AM, we can assign arbitrary groups to 
> task instances, which will allow more partitioning strategies, as discussed 
> in SAMZA-71.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to