[
https://issues.apache.org/jira/browse/SAMZA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082316#comment-14082316
]
Chris Riccomini commented on SAMZA-310:
---------------------------------------
bq. slf4j has the MDC, but grizzled.slf4j does not...
Doh! I forgot SLF4J had an MDC as well. Given that Grizzled doesn't, let's just
directly access SLF4J's. We'll have to add SLF4J as an explicit compile-time
dependency, but that's fine, since Grizzled is already pulling it in
transitively.
In fact, I'd like to remove Grizzled at some point anyway. We can implement our
own ([as Kafka
does|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/utils/Logging.scala])
in a single class, so I don't think that it provides much benefit. I've opened
SAMZA-361 for this.
bq. Can we just set up MDC at the starting time of the containers, instead of
vai TaskLifecycleListener? Since the goal of assigning the AM/ContainerID
information is to have the key to the logs, these information can be retrieved
at the starting time of the container.
+1 to this. Great idea. Your original idea about setting the TaskName (used to
be partition) every time we enter a TaskInstance's code to
process/window/send/commit/close would also be useful. One thing to watch out
for here is performance impact of setting the MDC. My hope is that it's not a
thread safe, and just a map update. We should run
TestSamzaContainerPerformance, and verify that we're still getting good
performance after this change.
> Publish container logs to a SystemStream
> ----------------------------------------
>
> Key: SAMZA-310
> URL: https://issues.apache.org/jira/browse/SAMZA-310
> Project: Samza
> Issue Type: New Feature
> Components: container
> Affects Versions: 0.7.0
> Reporter: Martin Kleppmann
>
> At the moment, it's a bit awkward to get to a Samza job's logs: assuming
> you're running on YARN, you have to navigate around the YARN web interface,
> and you can only see one container's logs at a time.
> Given that Samza is all about streams, it would make sense for the logs
> generated by Samza jobs to also be sent to a stream. There, they could be
> indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed
> by an exception-tracking system, etc.
> Notes:
> - The serde for encoding logs into a suitable wire format should be
> pluggable. There can be a default implementation that uses JSON, analogous to
> MetricsSnapshotSerdeFactory for metrics, but organisations that already have
> a standardised in-house encoding for logs should be able to use it.
> - Should this be at the level of Slf4j or Log4j? Currently the log
> configuration for YARN jobs uses Log4j, which has the advantage that any
> frameworks/libraries that use Log4j but not Slf4j appear in the logs.
> However, Samza itself currently only depends on Slf4j. If we tie this feature
> to Log4j, it would somewhat defeat the purpose of using Slf4j.
> - Do we need to consider partitioning? Perhaps we can use the container name
> as partitioning key, so that the ordering of logs from each container is
> preserved.
--
This message was sent by Atlassian JIRA
(v6.2#6252)