[ 
https://issues.apache.org/jira/browse/TEZ-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784727#comment-16784727
 ] 

Prasanth Jayachandran commented on TEZ-4039:
--------------------------------------------

[~jeagles] queryId is hive construct which could span one or more dag (union 
all queries). Currently, hive uses CallableWithNdc provided by tez where it 
sets up 3 Ids from coarser to finer granularity queryId, dagId, fragmentId (tez 
attempt id). The problem with NDC is that it can only be used with log lines, 
if we want to use id based log routing there is no key associated with NDC so 
routing based on the keys is not possible. IIRC tez does something via log4j 
APIs to route logs per dag in the AM which could be simplified if we move to 
MDC (it then becomes configurable from log4j properties).

I think copying of NDC to MDC will also be not possible as at the time of 
copying we do not know what key in MDC to associate the value from NDC. 
Example: There could be 3 values in NDC like 10 -> 200 -> 300 and when we copy 
them to MDC we don't know the keys and how to associate the values to the keys.

One possible solution is that Tez could provide CallableWithMdc which hive will 
use to copy the IDs along with its key. 

queryId -> 10

dagId -> 200

fragmentId -> 300

 

Alternate option is to specify custom MDC KVs via config. We could add a tez 
config to add custom keys to MDC the value of which can be obtained from conf 
object as well. 

A rough example of this could be, 

tez.mdc.custom.keys=queryId

tez.mdc.custom.keys.values.from=hive.query.id

This says add MDC 'queryId' with value by reading 'hive.query.id' configuration 
value which will be passed to AM by hive. 

 

This has to be done for task side logs and AM logs.

Log4j2 is not required for this functionality. We could use MDC from slf4j.

Since tez mostly uses slf4j everywhere, with classpath tricks we could make tez 
work with log4j2 (we just need to put the log4j2 and log4j2 api bridge jar at 
the top of classpath and slf4j automatically will bind to log4j2). If tez uses 
log4j1.x non-standard APIs then this bridge might not work. 

> Tez should inject dag id, query id into MDC
> -------------------------------------------
>
>                 Key: TEZ-4039
>                 URL: https://issues.apache.org/jira/browse/TEZ-4039
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.next
>            Reporter: Prasanth Jayachandran
>            Priority: Major
>
> Tez currently uses CallableWithNdc to store thread specific context. It 
> should also inject the context into MDC so that pattern layout can dump the 
> contexts from MDC (with NDC it is not possible to read the context in pattern 
> lyaout).
> Hive for example, sets queryId in the MDC and pattern layout prints the 
> queryId
>  
> {code:java}
> %d{ISO8601} %-5p [%t (%X{queryId})] %c{2}: %m%n
> {code}
> Llap sets dagId, fragmentId and queryId into MDC which is used for queryId 
> based routing of logging.
> Similarly, Tez AM should set dagId and queryId (if available) into MDC. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to