[ 
https://issues.apache.org/jira/browse/KNOX-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364878#comment-16364878
 ] 

Kevin Risden commented on KNOX-1091:
------------------------------------

So some thoughts after pondering on this on and off today. It looks like MDC is 
the middle piece that tries to hold the UUID. I'm not convinced that MDC is 
being handled correctly with the Jetty Threadpool. From what I gather threads 
being reused can cause issues with MDC if it is not cleaned out between uses. I 
don't see any places where MDC.remove or MDC.clear is called except in tests.

Some references:
 * [http://ashtonkemerling.com/blog/2017/09/01/mdc-and-threadpools/]
 * 
[https://gquintana.github.io/2017/12/01/Structured-logging-with-SL-FJ-and-Logback.html]
 ** Specifically from the above SLF4J and Logback link

{quote}Let’s get back to the example, we saw the MDC stores extra information 
about logs. The MDC is usually based on a thread local variable, this has two 
drawbacks:
 # It must be properly cleaned after being used, or you may experience 
information leaks if the thread is reused. Think about thread pools in web 
servers like Tomcat.

 # The information may no be properly transfered from one thread to another. 
Think about asynchronous calls.
As a result, calling {{MDC.remove}}, like the above example, (or {{MDC.clear}}) 
is required to clean the MDC after usage. In order not to forget to do the 
housework afterwards, we can use a try-with-resource construct:
{{try(MDC.MDCCloseable mdc = MDC.putCloseable("userId", "gquintana")) { 
demoLogger.info("Hello world!"); }}}
It’s better but still verbose. Hopefully, this kind of code won’t make its way 
in your business code because it is usually hidden in an interceptor like a 
Servlet filter, a Spring aspect or a JAXRS interceptor. In Logback, there is a 
{{MDCInsertingServletFilter}} class which can serve as an example.
{quote}

https://logback.qos.ch/xref/ch/qos/logback/classic/helpers/MDCInsertingServletFilter.html
 is referenced and does the following to make sure the MDC is cleared.

{code:java}
45      public void doFilter(ServletRequest request, ServletResponse response, 
FilterChain chain) throws IOException, ServletException {
46  
47          insertIntoMDC(request);
48          try {
49              chain.doFilter(request, response);
50          } finally {
51              clearMDC();
52          }
53      }
{code}


> Knox Audit Logging - duplicate correlation ids
> ----------------------------------------------
>
>                 Key: KNOX-1091
>                 URL: https://issues.apache.org/jira/browse/KNOX-1091
>             Project: Apache Knox
>          Issue Type: Bug
>          Components: Server
>            Reporter: Kevin Risden
>            Priority: Major
>
> From the Knox User list thread: "Multiple topology audit logging", it came to 
> my attention that Knox seems to be logging duplicate correlation ids. 
> Separating out the topic specifically here to dig a bit deeper.
> While looking at our Knox audit logs (Knox 0.9 on HDP 2.5) the "correlation 
> id" doesn't seem to be unique across requests. Is this to be expected? Here 
> is a snippet (anonymized):
> grep 7557c91b-2a48-4e09-aefc-44e9892372da /var/knox/gateway-audit.log
>  {code}
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE1/ID1//|unavailable|Request
>  method: GET
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||authentication|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Groups:
>  []
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw|unavailable|Request
>  method: GET
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||dispatch|uri|http://WEBHBASE.example.com:8084/NAMESPACE2:TABLE2/multiget?doAs=USER1&row=ID2%2Fd%3Araw|success|Response
>  status: 200
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE|USER1|||access|uri|/gateway/HADOOPPROD/hbase/NAMESPACE2:TABLE2/multiget?row=ID2%2fd%3araw&|success|Response
>  status: 200
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE||||authentication|principal|USER2|failure|LDAP
>  authentication failed.
> 17/10/10 12:50:09 
> ||7557c91b-2a48-4e09-aefc-44e9892372da|audit|WEBHBASE||||access|uri|/gateway/HADOOPTEST/hbase/hbase/NAMESPACE1:TABLE2/ID1//|success|Response
>  status: 401
> {code}
> The things to highlight here for the same correlation id:
> * different topologies are being used
> * different uris are being used
> * different users are being used
> Some of the things that we have configured that could impact results:
> * authentication caching
> * multiple Knox servers
> * load balancer in front of Knox



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to