[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

Chris Riccomini (JIRA) Fri, 10 Oct 2014 15:12:08 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167637#comment-14167637
 ]


Chris Riccomini commented on SAMZA-348:
---------------------------------------

bq. We could do something as simple as Class.forName(uri.getScheme() + 
"SystemFactory").newInstance(). This seems a bit hacky and dangerous, but 
should work, and maintains pluggability.

Rather than this, we could just add an extra switch to the CLI to provide a 
system factory, but provide built-in defaults for URIs with the file:// and 
kafka:// schemes. This seems a bit less hacky, and should still work out of the 
box, for most folks.

In addition, because the control-job.sh script and job cooridnator will want to 
both read and write to the ConfigStream, we'll have to provide both the broker 
metadata list, and the ZK path in the URI. This can probably be done with 
something like:

{noformat}
kafka://<broker-list>:<broker ports>?zk=<zk-list>:<zk-port>
kafka://192.168.0.1,192.168.0.2:9192?zk=192.168.0.1,192.168.0.2,192.168.0.3:2181
{noformat}

It's ugly, but it's not our fault. Neither of these systems are represented 
well in URI schemes.

Eventually, the need for a ZK path should go away, when Kafka finishes moving 
all of its ZK dependencies behind a broker protocol.

> Configure Samza jobs through a stream
> -------------------------------------
>
>                 Key: SAMZA-348
>                 URL: https://issues.apache.org/jira/browse/SAMZA-348
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Chris Riccomini
>              Labels: project
>         Attachments: DESIGN-SAMZA-348-0.md, DESIGN-SAMZA-348-0.pdf, 
> DESIGN-SAMZA-348-1.md, DESIGN-SAMZA-348-1.pdf
>
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic 
> reconfiguration and auto-scaling. It is debatable whether we want these 
> feature or not, but our existing implementation actively prevents it. See 
> SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports 
> environment variables in a shell script, which limits the size to the varargs 
> length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337 
> for details.
> # User-defined configuration (the Config object) and programmatic 
> configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are 
> handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log 
> would replace both the checkpoint topic and the existing config environment 
> variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of 
> the ConfigLog, and not re-designing how Samza's config is used in the code 
> (SAMZA-40). We should, however, discuss how this feature would affect dynamic 
> reconfiguration/auto-scaling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-348) Configure Samza jobs through a stream

Reply via email to