Martin Kleppmann created SAMZA-310:
--------------------------------------
Summary: Publish container logs to a SystemStream
Key: SAMZA-310
URL: https://issues.apache.org/jira/browse/SAMZA-310
Project: Samza
Issue Type: New Feature
Components: container
Affects Versions: 0.7.0
Reporter: Martin Kleppmann
At the moment, it's a bit awkward to get to a Samza job's logs: assuming you're
running on YARN, you have to navigate around the YARN web interface, and you
can only see one container's logs at a time.
Given that Samza is all about streams, it would make sense for the logs
generated by Samza jobs to also be sent to a stream. There, they could be
indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed
by an exception-tracking system, etc.
Notes:
- The serde for encoding logs into a suitable wire format should be pluggable.
There can be a default implementation that uses JSON, analogous to
MetricsSnapshotSerdeFactory for metrics, but organisations that already have a
standardised in-house encoding for logs should be able to use it.
- Should this be at the level of Slf4j or Log4j? Currently the log
configuration for YARN jobs uses Log4j, which has the advantage that any
frameworks/libraries that use Log4j but not Slf4j appear in the logs. However,
Samza itself currently only depends on Slf4j. If we tie this feature to Log4j,
it would somewhat defeat the purpose of using Slf4j.
- Do we need to consider partitioning? Perhaps we can use the container name as
partitioning key, so that the ordering of logs from each container is preserved.
--
This message was sent by Atlassian JIRA
(v6.2#6252)