[ 
https://issues.apache.org/jira/browse/SAMZA-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215017#comment-14215017
 ] 

Martin Kleppmann commented on SAMZA-310:
----------------------------------------

Nice work! I added some comments on the RB. I also tried adding this appender 
to hello-samza, and running the wikipedia-feed job. It partially worked:

* AM logs appeared in Kafka, except that the first 35 or so lines of log were 
missing. Do you know why that would be? Is there any way we can capture logs 
right from the start (buffering them if necessary if the producer is not yet 
connected)?
* When I looked at the output of kafka-console-consumer, the lines of the AM 
logs appeared in a different order from how they appeared in the regular log 
file. Any idea why? I thought using the container name as message key should 
make all the messages go to the same partition, and thus preserve their order.
* Logs from the other container (non-AM) did not appear in Kafka. The following 
error appeared on the container's stdout:

{noformat}
log4j:ERROR Could not create an Appender. Reported error follows.
org.apache.samza.SamzaException: can not read the config from 
{"config":{"systems.kafka.consumer.zookeeper.connect":"localhost:2181/","systems.kafka.samza.factory":"org.apache.samza.system.kafka.KafkaSystemFactory","systems.wikipedia.port":"6667","task.inputs":"wikipedia.#en.wikipedia,wikipedia.#en.wiktionary,wikipedia.#en.wikinews","systems.wikipedia.host":"irc.wikimedia.org","systems.kafka.producer.producer.type":"sync","yarn.package.path":"file:///Users/martin/dev/samza/hello-samza/target/hello-samza-0.8.0-dist.tar.gz","job.factory.class":"org.apache.samza.job.yarn.YarnJobFactory","systems.kafka.producer.metadata.broker.list":"localhost:9092","task.class":"samza.examples.wikipedia.task.WikipediaFeedStreamTask","systems.kafka.producer.batch.num.messages":"1","systems.kafka.samza.msg.serde":"json","job.name":"wikipedia-feed","serializers.registry.json.class":"org.apache.samza.serializers.JsonSerdeFactory","task.log4j.system":"kafka","systems.wikipedia.samza.factory":"samza.examples.wikipedia.system.WikipediaSystemFactory"},"containers":{"0":{"container-id":0,"tasks":{"Partition
 0":{"task-name":"Partition 
0","system-stream-partitions":[{"system":"wikipedia","partition":0,"stream":"#en.wikinews"},{"system":"wikipedia","partition":0,"stream":"#en.wiktionary"},{"system":"wikipedia","partition":0,"stream":"#en.wikipedia"}],"changelog-partition":0}}}}}
        at 
org.apache.samza.logging.log4j.SystemProducerAppender.getConfig(SystemProducerAppender.java:185)
        at 
org.apache.samza.logging.log4j.SystemProducerAppender.activateOptions(SystemProducerAppender.java:82)
        at 
org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
        at 
org.apache.log4j.xml.DOMConfigurator.parseAppender(DOMConfigurator.java:295)
        at 
org.apache.log4j.xml.DOMConfigurator.findAppenderByName(DOMConfigurator.java:176)
        at 
org.apache.log4j.xml.DOMConfigurator.findAppenderByReference(DOMConfigurator.java:191)
        at 
org.apache.log4j.xml.DOMConfigurator.parseChildrenOfLoggerElement(DOMConfigurator.java:523)
        at 
org.apache.log4j.xml.DOMConfigurator.parseRoot(DOMConfigurator.java:492)
        at org.apache.log4j.xml.DOMConfigurator.parse(DOMConfigurator.java:1001)
        at 
org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:867)
        at 
org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:773)
        at 
org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
        at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
        at 
org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:73)
        at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:253)
        at org.apache.samza.util.Logging$class.logger(Logging.scala:27)
        at 
org.apache.samza.metrics.JmxServer.logger$lzycompute(JmxServer.scala:41)
        at org.apache.samza.metrics.JmxServer.logger(JmxServer.scala:41)
        at org.apache.samza.util.Logging$class.info(Logging.scala:54)
        at org.apache.samza.metrics.JmxServer.info(JmxServer.scala:41)
        at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:73)
        at org.apache.samza.metrics.JmxServer.<init>(JmxServer.scala:44)
        at 
org.apache.samza.container.SamzaContainer$.safeMain$default$1(SamzaContainer.scala:72)
        at 
org.apache.samza.container.SamzaContainer$.main(SamzaContainer.scala:69)
        at org.apache.samza.container.SamzaContainer.main(SamzaContainer.scala)
Caused by: org.codehaus.jackson.map.JsonMappingException: Can not deserialize 
instance of java.lang.String out of START_OBJECT token
 at [Source: N/A; line: -1, column: -1]
        at 
org.codehaus.jackson.map.JsonMappingException.from(JsonMappingException.java:163)
        at 
org.codehaus.jackson.map.deser.StdDeserializationContext.mappingException(StdDeserializationContext.java:198)
        at 
org.codehaus.jackson.map.deser.StdDeserializer$StringDeserializer.deserialize(StdDeserializer.java:671)
        at 
org.codehaus.jackson.map.deser.StdDeserializer$StringDeserializer.deserialize(StdDeserializer.java:640)
        at 
org.codehaus.jackson.map.deser.MapDeserializer._readAndBind(MapDeserializer.java:235)
        at 
org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeserializer.java:165)
        at 
org.codehaus.jackson.map.deser.MapDeserializer.deserialize(MapDeserializer.java:25)
        at 
org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2376)
        at 
org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1747)
        at 
org.apache.samza.serializers.model.SamzaObjectMapper$ConfigDeserializer.deserialize(SamzaObjectMapper.java:104)
        at 
org.apache.samza.serializers.model.SamzaObjectMapper$ConfigDeserializer.deserialize(SamzaObjectMapper.java:99)
        at 
org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2395)
        at 
org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1595)
        at 
org.apache.samza.logging.log4j.SystemProducerAppender.getConfig(SystemProducerAppender.java:182)
        ... 24 more
{noformat}

> Publish container logs to a SystemStream
> ----------------------------------------
>
>                 Key: SAMZA-310
>                 URL: https://issues.apache.org/jira/browse/SAMZA-310
>             Project: Samza
>          Issue Type: New Feature
>          Components: container
>    Affects Versions: 0.7.0
>            Reporter: Martin Kleppmann
>            Assignee: Yan Fang
>             Fix For: 0.8.0
>
>         Attachments: SAMZA-310.1.patch, SAMZA-310.patch
>
>
> At the moment, it's a bit awkward to get to a Samza job's logs: assuming 
> you're running on YARN, you have to navigate around the YARN web interface, 
> and you can only see one container's logs at a time.
> Given that Samza is all about streams, it would make sense for the logs 
> generated by Samza jobs to also be sent to a stream. There, they could be 
> indexed with [Kibana|http://www.elasticsearch.org/overview/kibana/], consumed 
> by an exception-tracking system, etc.
> Notes:
> - The serde for encoding logs into a suitable wire format should be 
> pluggable. There can be a default implementation that uses JSON, analogous to 
> MetricsSnapshotSerdeFactory for metrics, but organisations that already have 
> a standardised in-house encoding for logs should be able to use it.
> - Should this be at the level of Slf4j or Log4j? Currently the log 
> configuration for YARN jobs uses Log4j, which has the advantage that any 
> frameworks/libraries that use Log4j but not Slf4j appear in the logs. 
> However, Samza itself currently only depends on Slf4j. If we tie this feature 
> to Log4j, it would somewhat defeat the purpose of using Slf4j.
> - Do we need to consider partitioning? Perhaps we can use the container name 
> as partitioning key, so that the ordering of logs from each container is 
> preserved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to