[
https://issues.apache.org/jira/browse/SAMZA-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169429#comment-14169429
]
Chris Riccomini commented on SAMZA-348:
---------------------------------------
bq. Do you have an estimate of how big these payloads are likely to get?
They will get above 50K for jobs with a lot of input SSPs/task. In general,
they shouldn't be that large, though.
bq. I mildly prefer the K:V format as it's a bit cleaner, and transactionality
is much needed anyway.
Personally, I agree. I think we should switch to k/v once we have
transactionality, but in the meantime, I'm just keeping the proposal inline
with what we already do (single message for all offsets for a single task).
bq. We could define that the scheme is the name of a system defined in the job
configuration.
This seems to be the cleanest/most general way to do things. It seems not too
great from a usability perspective, but if we provide defaults for the
file/kafka systems, then it should be OK.
bq. I think it would be very good if the Kafka runtime dependency is optional.
I agree.
bq. So I would be keen for Samza to be able to use files for checkpoints and
config in dev.
Sounds reasonable to me.
> Configure Samza jobs through a stream
> -------------------------------------
>
> Key: SAMZA-348
> URL: https://issues.apache.org/jira/browse/SAMZA-348
> Project: Samza
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Chris Riccomini
> Labels: project
> Attachments: DESIGN-SAMZA-348-0.md, DESIGN-SAMZA-348-0.pdf,
> DESIGN-SAMZA-348-1.md, DESIGN-SAMZA-348-1.pdf
>
>
> Samza's existing config setup is problematic for a number of reasons:
> # It's completely immutable once a job starts. This prevents any dynamic
> reconfiguration and auto-scaling. It is debatable whether we want these
> feature or not, but our existing implementation actively prevents it. See
> SAMZA-334 for discussion.
> # We pass existing configuration through environment variables. YARN exports
> environment variables in a shell script, which limits the size to the varargs
> length on the machine. This is usually ~128KB. See SAMZA-333 and SAMZA-337
> for details.
> # User-defined configuration (the Config object) and programmatic
> configuration (checkpoints and TaskName:State mappings (see SAMZA-123)) are
> handled differently. It's debatable whether this makes sense.
> In SAMZA-123, [~jghoman] and I propose implementing a ConfigLog. This log
> would replace both the checkpoint topic and the existing config environment
> variables in SamzaContainer and Samza's YARN AM.
> I'd like to keep this ticket's scope limited to just the implementation of
> the ConfigLog, and not re-designing how Samza's config is used in the code
> (SAMZA-40). We should, however, discuss how this feature would affect dynamic
> reconfiguration/auto-scaling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)