[jira] [Commented] (SAMZA-310) Publish container logs to a SystemStream

Chris Riccomini (JIRA) Fri, 01 Aug 2014 08:07:35 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082316#comment-14082316
 ]


Chris Riccomini commented on SAMZA-310:
---------------------------------------

bq. slf4j has the MDC, but grizzled.slf4j does not...

Doh! I forgot SLF4J had an MDC as well. Given that Grizzled doesn't, let's just 
directly access SLF4J's. We'll have to add SLF4J as an explicit compile-time 
dependency, but that's fine, since Grizzled is already pulling it in 
transitively. 

In fact, I'd like to remove Grizzled at some point anyway. We can implement our 
own ([as Kafka 
does|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/utils/Logging.scala])
 in a single class, so I don't think that it provides much benefit. I've opened 
SAMZA-361 for this.

bq. Can we just set up MDC at the starting time of the containers, instead of 
vai TaskLifecycleListener? Since the goal of assigning the AM/ContainerID 
information is to have the key to the logs, these information can be retrieved 
at the starting time of the container.

+1 to this. Great idea. Your original idea about setting the TaskName (used to 
be partition) every time we enter a TaskInstance's code to 
process/window/send/commit/close would also be useful. One thing to watch out 
for here is performance impact of setting the MDC. My hope is that it's not a 
thread safe, and just a map update. We should run 
TestSamzaContainerPerformance, and verify that we're still getting good 
performance after this change.

> Publish container logs to a SystemStream
> ----------------------------------------
>
>                 Key: SAMZA-310
>                 URL: https://issues.apache.org/jira/browse/SAMZA-310
>             Project: Samza
>          Issue Type: New Feature
>          Components: container
>    Affects Versions: 0.7.0
>            Reporter: Martin Kleppmann
>
> At the moment, it's a bit awkward to get to a Samza job's logs: assuming 
> you're running on YARN, you have to navigate around the YARN web interface, 
> and you can only see one container's logs at a time.
> Given that Samza is all about streams, it would make sense for the logs 
> generated by Samza jobs to also be sent to a stream. There, they could be 
> indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed 
> by an exception-tracking system, etc.
> Notes:
> - The serde for encoding logs into a suitable wire format should be 
> pluggable. There can be a default implementation that uses JSON, analogous to 
> MetricsSnapshotSerdeFactory for metrics, but organisations that already have 
> a standardised in-house encoding for logs should be able to use it.
> - Should this be at the level of Slf4j or Log4j? Currently the log 
> configuration for YARN jobs uses Log4j, which has the advantage that any 
> frameworks/libraries that use Log4j but not Slf4j appear in the logs. 
> However, Samza itself currently only depends on Slf4j. If we tie this feature 
> to Log4j, it would somewhat defeat the purpose of using Slf4j.
> - Do we need to consider partitioning? Perhaps we can use the container name 
> as partitioning key, so that the ordering of logs from each container is 
> preserved.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SAMZA-310) Publish container logs to a SystemStream

Reply via email to