[
https://issues.apache.org/jira/browse/IMPALA-10049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419888#comment-17419888
]
Riza Suminto commented on IMPALA-10049:
---------------------------------------
call_id is recently exposed through RpcController class in this commit:
[https://github.com/apache/kudu/commit/4438a15f9238f818e9af1230521e8d7c4924f92b]
After we rebase this KRPC change to Impala code, we should able to log the
client_id in the sender side.
I see that slow RPC's call_id is logged already in the receiver side like this:
{code:java}
I0923 16:12:43.990563 8667 krpc-data-stream-mgr.cc:372] Sender 127.0.0.1 timed
out waiting for receiver fragment instance: 854be678cb62de18:80c91e9600000005,
dest node: 4
I0923 16:12:43.990624 8667 rpcz_store.cc:269] Call
impala.DataStreamService.TransmitData from 127.0.0.1:38918 (request call id 1)
took 6030ms. Request Metrics: {}
I0923 16:12:43.990684 8667 rpcz_store.cc:273] Trace:
0923 16:12:37.960211 (+ 0us) impala-service-pool.cc:179] Inserting onto call
queue
0923 16:12:37.960262 (+ 51us) impala-service-pool.cc:278] Handling call
0923 16:12:37.960308 (+ 46us) krpc-data-stream-mgr.cc:228] Added early sender
0923 16:12:43.990514 (+6030206us) krpc-data-stream-mgr.cc:364] Timed out sender
0923 16:12:43.990617 (+ 103us) inbound_call.cc:162] Queueing success response
Metrics: {}{code}
Adding call_id in the sender log for slow RPC will help us make association and
analyze wether the slowdown happens in network or RPC handler. In the meantime,
I'm assigning this Jira to myself.
> Include RPC call_id in slow RPC logs
> ------------------------------------
>
> Key: IMPALA-10049
> URL: https://issues.apache.org/jira/browse/IMPALA-10049
> Project: IMPALA
> Issue Type: Sub-task
> Components: Distributed Exec
> Reporter: Sahil Takiar
> Priority: Major
> Labels: observability
>
> The current code for logging slow RPCs on the sender side looks something
> like this:
> {code:java}
> template <typename ResponsePBType>
> void KrpcDataStreamSender::Channel::LogSlowRpc(
> ¦ const char* rpc_name, int64_t total_time_ns, const ResponsePBType& resp) {
> int64_t network_time_ns = total_time_ns - resp_.receiver_latency_ns();
> LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
> ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) <<
> "): "
> ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS)
> << ". "
> ¦ ¦ ¦ ¦ ¦ << "Receiver time: "
> ¦ ¦ ¦ ¦ ¦ << PrettyPrinter::Print(resp_.receiver_latency_ns(),
> TUnit::TIME_NS)
> ¦ ¦ ¦ ¦ ¦ << " Network time: " << PrettyPrinter::Print(network_time_ns,
> TUnit::TIME_NS);
> }
> void KrpcDataStreamSender::Channel::LogSlowFailedRpc(
> ¦ const char* rpc_name, int64_t total_time_ns, const kudu::Status& err) {
> LOG(INFO) << "Slow " << rpc_name << " RPC to " << address_
> ¦ ¦ ¦ ¦ ¦ << " (fragment_instance_id=" << PrintId(fragment_instance_id_) <<
> "): "
> ¦ ¦ ¦ ¦ ¦ << "took " << PrettyPrinter::Print(total_time_ns, TUnit::TIME_NS)
> << ". "
> ¦ ¦ ¦ ¦ ¦ << "Error: " << err.ToString();
> } {code}
> It would be nice to include the call_id in the logs as well so that RPCs can
> more easily be traced. The RPC call_id is dumped in RPC traces on the
> receiver side, as well as in the /rpcz output on the debug ui.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]