[jira] [Commented] (SAMZA-310) Publish container logs to a SystemStream

Martin Kleppmann (JIRA) Tue, 21 Oct 2014 12:21:06 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178876#comment-14178876
 ]


Martin Kleppmann commented on SAMZA-310:
----------------------------------------

bq. The AM also triggers the kafkaLog4jAppender. So before it reads config, it 
does not have any knowledge about the config at all. Basically, this means, the 
log4j.xml should be read even before the config file is read.

I believe the AM also receives the config in a SAMZA_CONFIG environment 
variable, does it not? I don't understand the problem here.

bq. When containers read the config from the environment variable, they can not 
tell which system to produce messages to.

We could introduce a new property in the configuration that tells the Log4j 
appender which system to use, e.g. {{job.log.system=kafka}} (analogous to 
{{task.checkpoint.system}}) or {{job.log.stream=kafka.my-log-topic}} (analogous 
to {{stores.*.changelog}}). Also, since this setup would have full access to 
the config, we could use a regular SystemProducer rather than tying the Log4j 
appender to Kafka.

> Publish container logs to a SystemStream
> ----------------------------------------
>
>                 Key: SAMZA-310
>                 URL: https://issues.apache.org/jira/browse/SAMZA-310
>             Project: Samza
>          Issue Type: New Feature
>          Components: container
>    Affects Versions: 0.7.0
>            Reporter: Martin Kleppmann
>            Assignee: Yan Fang
>             Fix For: 0.8.0
>
>         Attachments: SAMZA-310.patch
>
>
> At the moment, it's a bit awkward to get to a Samza job's logs: assuming 
> you're running on YARN, you have to navigate around the YARN web interface, 
> and you can only see one container's logs at a time.
> Given that Samza is all about streams, it would make sense for the logs 
> generated by Samza jobs to also be sent to a stream. There, they could be 
> indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed 
> by an exception-tracking system, etc.
> Notes:
> - The serde for encoding logs into a suitable wire format should be 
> pluggable. There can be a default implementation that uses JSON, analogous to 
> MetricsSnapshotSerdeFactory for metrics, but organisations that already have 
> a standardised in-house encoding for logs should be able to use it.
> - Should this be at the level of Slf4j or Log4j? Currently the log 
> configuration for YARN jobs uses Log4j, which has the advantage that any 
> frameworks/libraries that use Log4j but not Slf4j appear in the logs. 
> However, Samza itself currently only depends on Slf4j. If we tie this feature 
> to Log4j, it would somewhat defeat the purpose of using Slf4j.
> - Do we need to consider partitioning? Perhaps we can use the container name 
> as partitioning key, so that the ordering of logs from each container is 
> preserved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-310) Publish container logs to a SystemStream

Reply via email to