[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

Chinmay Soman (JIRA) Mon, 15 Sep 2014 12:14:22 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134337#comment-14134337
 ]


Chinmay Soman commented on SAMZA-348:
-------------------------------------

bq. I was thinking the source of truth would be the underlying stream, since 
this is what the job coordinator will use to run the job. Whether the config is 
mutated from the AM web UI, or from a CLI, I haven't considered very much.Maybe 
you're trying to get at the idea that we could try and funnel all mutations to 
the ConfigStrem through a single writer?

The source of truth is the stream. However, in case a lot of modifications are 
done (either manually or automatically), the user might lose track of what the 
exact config is. So yeah, funneling all mutations through a single writer (like 
the AM) might add value - so that :
* We can reflect the current config accurately (for example - if within 
LinkedIn, the user only modifies the config via cfg2, then there's an extra 
overhead of keeping that in sync with the actual config - since config 
mutations might be done via the AM). 
* Avoid all concurrency issues.

> Configure Samza jobs through a stream
> -------------------------------------
>
>                 Key: SAMZA-348
>                 URL: https://issues.apache.org/jira/browse/SAMZA-348
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Chris Riccomini
>              Labels: project
>         Attachments: DESIGN-SAMZA-348-0.md, DESIGN-SAMZA-348-0.pdf
>
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic 
> reconfiguration and auto-scaling. It is debatable whether we want these 
> feature or not, but our existing implementation actively prevents it. See 
> SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports 
> environment variables in a shell script, which limits the size to the varargs 
> length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 
> for details.
> # User-defined configuration (the Config object) and programmatic 
> configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are 
> handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log 
> would replace both the checkpoint topic and the existing config environment 
> variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of 
> the ConfigLog, and not re-designing how Samza's config is used in the code 
> (SAMZA-40). We should, however, discuss how this feature would affect dynamic 
> reconfiguration/auto-scaling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

Reply via email to