[jira] [Comment Edited] (KAFKA-3816) Provide more context in Kafka Connect log messages

Randall Hauch (JIRA) Thu, 09 Jun 2016 18:13:01 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323684#comment-15323684
 ]


Randall Hauch edited comment on KAFKA-3816 at 6/10/16 1:11 AM:
---------------------------------------------------------------

{quote}
One thing we might want to do in conjunction with this is to allow the 
framework to allocate threads for connectors and set this up. Background 
threads that monitor sources are a particular case that we don't provide any 
help to connector developers for and would mean we might be missing this info 
in those threads.
{quote}

The simple approach is to have the connector code that creates threads set this 
up, but I agree it's fairly complex and will likely be ignored or not done. We 
can provide a factory method for threads, perhaps on the context, but we'd 
probably have to provide multiple overloaded methods to provide the flexibility 
they need. 

Perhaps a slightly easier way is to provide to the contexts a new interface 
called {{LoggingContext}} that has methods that connector and task 
implementations can call:

{code:java}
public interface LoggingContext {
     public void use(String contextName);
}
{code}

The reason I'd suggest an interface is just in case we want to provide 
additional helper methods in the future. For example, with Java 8 it's really 
easy to temporarily perform logic within a custom context: 

{code:java}
loggingContext.withContext("newContextName", ()->{ ... });
{code}

where that lambda might log messages, and after the lambda completes it 
automatically sets the MDC context back to what it was before this method is 
called. (I'm not saying we need this now, but I wouldn't be surprised if we 
want to add a few helper methods in the future.)

I'm also not a fan of hiding too much (e.g., if Kafka Connector provides 
subclasses of {{SourceTask}} that handle some of the threading automatically). 
In Debezium our {{SourceTask}} implementation in our MySQL connector actually 
doesn't create any threads; instead, it creates 1 or more "reader" components 
(e.g., snapshot reader, binlog reader, etc.) based upon what the task needs to 
do, and the task then runs these sequentially. Each reader implementation can 
create their own threads if needed or uses libraries that create threads. And 
while libraries rarely give you the control you need for thread creation, they 
often take callbacks or make other calls to methods that you do provide, and 
its in those methods that you have a chance to set the MDC parameters. This 
would be easy if the connector code can call methods, but much harder if you 
just provide a thread factory that sets it up for those threads (and only those 
threads).

{quote}
For the naming – the scope makes sense but we want a reliable way to extract 
this from the connector without it having to provide extra info if possible. 
Basing this on class name could work. My guess is that 3 parameters should be 
sufficient (and if not, we probably need to do some refactoring to simplify 
that code), but we'll just have to try it to find out.
{quote}

Yeah, I agree. Perhaps a non-abstract method (e.g., {{getConnectorTypeName()}}) 
on the Connector class that defaults to grabbing the class short name. Then, 
subclasses can always override this to provide something shorter and more 
meaningful.


was (Author: rhauch):
{quote}
One thing we might want to do in conjunction with this is to allow the 
framework to allocate threads for connectors and set this up. Background 
threads that monitor sources are a particular case that we don't provide any 
help to connector developers for and would mean we might be missing this info 
in those threads.
{quote}

The simple approach is to have the connector code that creates threads set this 
up, but I agree it's fairly complex and will likely be ignored or not done. We 
can provide a factory method for threads, perhaps on the context, but we'd 
probably have to provide multiple overloaded methods to provide the flexibility 
they need. 

Perhaps a slightly easier way is to provide to the contexts a new interface 
called {{LoggingContext}} that has methods that connector and task 
implementations can call:

{code:java}
public interface LoggingContext {
     public void use(String contextName);
}
{code}

The reason I'd suggest an interface is just in case we want to provide 
additional helper methods in the future. For example, with Java 8 it's really 
easy to temporarily perform logic within a custom context: 

{code:java}
loggingContext.withContext("newContextName", ()->{ ... });
{code}

where that lambda might log messages, and after the lambda completes it 
automatically sets the MDC context back to what it was before this method is 
called. (I'm not saying we need this now, but I wouldn't be surprised if we 
want to add a few helper methods in the future.)

I'm also not a fan of hiding too much (e.g., if Kafka Connector provides 
subclasses of {{SourceTask}} that handle some of the threading automatically). 
In Debezium our {{SourceTask}} implementation actually doesn't create the 
threads; instead, it creates 1 or more "reader" components (depending upon what 
needs to be done) that are used sequentially, and each reader implementations 
(e.g., snapshot reader, binlog reader, etc.) create their own threads if needed 
or uses libraries that create threads. And while libraries rarely give you the 
control you need for thread creation, they often take callbacks or make other 
calls to methods that you provide, and its in those methods that you have a 
chance to set the MDC parameters.

{quote}
For the naming – the scope makes sense but we want a reliable way to extract 
this from the connector without it having to provide extra info if possible. 
Basing this on class name could work. My guess is that 3 parameters should be 
sufficient (and if not, we probably need to do some refactoring to simplify 
that code), but we'll just have to try it to find out.
{quote}

Yeah, I agree. Perhaps a non-abstract method (e.g., {{getConnectorTypeName()}}) 
on the Connector class that defaults to grabbing the class short name. Then, 
subclasses can always override this to provide something shorter and more 
meaningful.

> Provide more context in Kafka Connect log messages
> --------------------------------------------------
>
>                 Key: KAFKA-3816
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3816
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.9.0.1
>            Reporter: Randall Hauch
>            Assignee: Ewen Cheslack-Postava
>
> Currently it is relatively difficult to correlate individual log messages 
> with the various threads and activities that are going on within a Kafka 
> Connect worker, let along a cluster of workers. Log messages should provide 
> more context to make it easier and to allow log scraping tools to coalesce 
> related log messages.
> One simple way to do this is by using _mapped diagnostic contexts_, or MDC. 
> This is supported by the SLF4J API, and by the Logback and Log4J logging 
> frameworks.
> Basically, the framework would be changed so that each thread is configured 
> with one or more MDC parameters using the 
> {{org.slf4j.MDC.put(String,String)}} method in SLF4J. Once that thread is 
> configured, all log messages made using that thread have that context. The 
> logs can then be configured to use those parameters.
> It would be ideal to define a convention for connectors and the Kafka Connect 
> framework. A single set of MDC parameters means that the logging framework 
> can use the specific parameters on its message formats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-3816) Provide more context in Kafka Connect log messages

Reply via email to