[ 
https://issues.apache.org/jira/browse/HDDS-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859095#comment-16859095
 ] 

Hudson commented on HDDS-1636:
------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #16707 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/16707/])
HDDS-1636. Tracing id is not propagated via async datanode grpc call (xyao: rev 
46b23c11b033c76b25897d61de53e9e36bb2b4b5)
* (edit) 
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/client/ContainerOperationClient.java
* (edit) 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/TestContainerStateMachineIdempotency.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestGetCommittedBlockLengthAndPutKey.java
* (edit) 
hadoop-hdds/client/src/test/java/org/apache/hadoop/hdds/scm/storage/TestChunkInputStream.java
* (edit) 
hadoop-ozone/objectstore-service/src/main/java/org/apache/hadoop/ozone/web/storage/DistributedStorageHandler.java
* (edit) 
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ChunkInputStream.java
* (edit) 
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockOutputStream.java
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/tracing/StringCodec.java
* (edit) 
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/storage/ContainerProtocolCalls.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestContainerSmallFile.java
* (edit) 
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java
* (edit) 
hadoop-hdds/client/src/test/java/org/apache/hadoop/hdds/scm/storage/TestBlockInputStream.java
* (edit) 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/BlockOutputStreamEntry.java
* (edit) 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/KeyInputStream.java
* (edit) 
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/TestXceiverClientManager.java
* (edit) 
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockInputStream.java
* (edit) 
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestChunkStreams.java


> Tracing id is not propagated via async datanode grpc call
> ---------------------------------------------------------
>
>                 Key: HDDS-1636
>                 URL: https://issues.apache.org/jira/browse/HDDS-1636
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.4.1
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Recently a new exception become visible in the datanode logs, using standard 
> freon (STANDLAONE)
> {code}
> datanode_2  | 2019-06-03 12:18:21 WARN  
> PropagationRegistry$ExceptionCatchingExtractorDecorator:60 - Error when 
> extracting SpanContext from carrier. Handling gracefully.
> datanode_2  | 
> io.jaegertracing.internal.exceptions.MalformedTracerStateStringException: 
> String does not match tracer state format: 
> 7576cabf-37a4-4232-9729-939a3fdb68c4WriteChunk150a8a848a951784256ca0801f7d9cf8b_stream_ed583cee-9552-4f1a-8c77-63f7d07b755f_chunk_1
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:49)
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.StringCodec.extract(StringCodec.java:34)
> datanode_2  |         at 
> io.jaegertracing.internal.PropagationRegistry$ExceptionCatchingExtractorDecorator.extract(PropagationRegistry.java:57)
> datanode_2  |         at 
> io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:208)
> datanode_2  |         at 
> io.jaegertracing.internal.JaegerTracer.extract(JaegerTracer.java:61)
> datanode_2  |         at 
> io.opentracing.util.GlobalTracer.extract(GlobalTracer.java:143)
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.TracingUtil.importAndCreateScope(TracingUtil.java:102)
> datanode_2  |         at 
> org.apache.hadoop.ozone.container.common.impl.HddsDispatcher.dispatch(HddsDispatcher.java:148)
> datanode_2  |         at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:73)
> datanode_2  |         at 
> org.apache.hadoop.ozone.container.common.transport.server.GrpcXceiverService$1.onNext(GrpcXceiverService.java:61)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onMessage(ServerCalls.java:248)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.Contexts$ContextualizedServerCallListener.onMessage(Contexts.java:76)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.ForwardingServerCallListener.onMessage(ForwardingServerCallListener.java:33)
> datanode_2  |         at 
> org.apache.hadoop.hdds.tracing.GrpcServerInterceptor$1.onMessage(GrpcServerInterceptor.java:46)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.messagesAvailable(ServerCallImpl.java:263)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1MessagesAvailable.runInContext(ServerImpl.java:686)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> datanode_2  |         at 
> org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> datanode_2  |         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> datanode_2  |         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> {code}
> It turned out that the tracingId propagation between XCeiverClient and Server 
> doesn't work very well (in case of Standalone and async commands)
>  1. there are many places (on the client side) where the traceId filled with  
> UUID.randomUUID().toString();  
>  2. This random id is propagated between the Output/InputStream and different 
> part of the clients
>  3. It is unnecessary, because in the XceiverClientGrpc and XceiverClientGrpc 
> the traceId field is overridden with the real opentracing id anyway 
> (sendCommand/sendCommandAsync)
>  4. Except in the XceiverClientGrpc.sendCommandAsync where this part is 
> accidentally missing.
> Things to fix:
>  1. fix XceiverClientGrpc.sendCommandAsync (replace any existing traceId with 
> the good one)
>  2. remove the usage of the UUID based traceId (it's not used)
>  3. Improve the error logging in case of an invalid traceId on the server 
> side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to