[
https://issues.apache.org/jira/browse/TEZ-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16784727#comment-16784727
]
Prasanth Jayachandran commented on TEZ-4039:
--------------------------------------------
[~jeagles] queryId is hive construct which could span one or more dag (union
all queries). Currently, hive uses CallableWithNdc provided by tez where it
sets up 3 Ids from coarser to finer granularity queryId, dagId, fragmentId (tez
attempt id). The problem with NDC is that it can only be used with log lines,
if we want to use id based log routing there is no key associated with NDC so
routing based on the keys is not possible. IIRC tez does something via log4j
APIs to route logs per dag in the AM which could be simplified if we move to
MDC (it then becomes configurable from log4j properties).
I think copying of NDC to MDC will also be not possible as at the time of
copying we do not know what key in MDC to associate the value from NDC.
Example: There could be 3 values in NDC like 10 -> 200 -> 300 and when we copy
them to MDC we don't know the keys and how to associate the values to the keys.
One possible solution is that Tez could provide CallableWithMdc which hive will
use to copy the IDs along with its key.
queryId -> 10
dagId -> 200
fragmentId -> 300
Alternate option is to specify custom MDC KVs via config. We could add a tez
config to add custom keys to MDC the value of which can be obtained from conf
object as well.
A rough example of this could be,
tez.mdc.custom.keys=queryId
tez.mdc.custom.keys.values.from=hive.query.id
This says add MDC 'queryId' with value by reading 'hive.query.id' configuration
value which will be passed to AM by hive.
This has to be done for task side logs and AM logs.
Log4j2 is not required for this functionality. We could use MDC from slf4j.
Since tez mostly uses slf4j everywhere, with classpath tricks we could make tez
work with log4j2 (we just need to put the log4j2 and log4j2 api bridge jar at
the top of classpath and slf4j automatically will bind to log4j2). If tez uses
log4j1.x non-standard APIs then this bridge might not work.
> Tez should inject dag id, query id into MDC
> -------------------------------------------
>
> Key: TEZ-4039
> URL: https://issues.apache.org/jira/browse/TEZ-4039
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.next
> Reporter: Prasanth Jayachandran
> Priority: Major
>
> Tez currently uses CallableWithNdc to store thread specific context. It
> should also inject the context into MDC so that pattern layout can dump the
> contexts from MDC (with NDC it is not possible to read the context in pattern
> lyaout).
> Hive for example, sets queryId in the MDC and pattern layout prints the
> queryId
>
> {code:java}
> %d{ISO8601} %-5p [%t (%X{queryId})] %c{2}: %m%n
> {code}
> Llap sets dagId, fragmentId and queryId into MDC which is used for queryId
> based routing of logging.
> Similarly, Tez AM should set dagId and queryId (if available) into MDC.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)