elek opened a new pull request #895: HDDS-1636. Tracing id is not propagated 
via async datanode grpc call
URL: https://github.com/apache/hadoop/pull/895
 
 
   Recently a new exception become visible in the datanode logs, using standard 
freon (STANDLAONE)
   
   {code}
   datanode_2  | 2019-06-03 12:18:21 WARN  
PropagationRegistry$ExceptionCatchingExtractorDecorator:60 - Error when 
extracting SpanContext from carrier. Handling gracefully.
   datanode_2  | 
io.jaegertracing.internal.exceptions.MalformedTracerStateStringException: 
String does not match tracer state format: 
7576cabf-37a4-4232-9729-939a3fdb68c4WriteChunk150a8a848a951784256ca0801f7d9cf8b_stream_ed583cee-9552-4f1a-8c77-63f7d07b755f_chunk_1
   datanode_2  |        at 
org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:49)
   datanode_2  |        at 
org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:34)
   datanode_2  |        at 
io.jaegertracing.internal.PropagationRegistry$ExceptionCatchingExtractorDecorator.extract(PropagationRegistry.java:57)
   datanode_2  |        at 
io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:208)
   datanode_2  |        at 
io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:61)
   datanode_2  |        at 
io.opentracing.util.GlobalTracer.extract(GlobalTracer.java:143)
   datanode_2  |        at 
org.apache.hadoop.hdds.tracing.TracingUtil.importAndCreateScope(TracingUtil.java:102)
   datanode_2  |        at 
org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
   datanode_2  |        at 
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
   datanode_2  |        at 
org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
   datanode_2  |        at 
org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:46)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
   datanode_2  |        at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
   datanode_2  |        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   datanode_2  |        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   {code}
   
   It turned out that the tracingId propagation between XCeiverClient and 
Server doesn't work very well (in case of Standalone and async commands)
   
    1. there are many places (on the client side) where the traceId filled with 
 UUID.randomUUID().toString();  
    2. This random id is propagated between the Output/InputStream and 
different part of the clients
    3. It is unnecessary, because in the XceiverClientGrpc and 
XceiverClientGrpc the traceId field is overridden with the real opentracing id 
anyway (sendCommand/sendCommandAsync)
    4. Except in the XceiverClientGrpc.sendCommandAsync where this part is 
accidentally missing.
   
   Things to fix:
   
    1. fix XceiverClientGrpc.sendCommandAsync (replace any existing traceId 
with the good one)
    2. remove the usage of the UUID based traceId (it's not used)
    3. Improve the error logging in case of an invalid traceId on the server 
side.
   
   See: https://issues.apache.org/jira/browse/HDDS-1636

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to