[ 
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189698#comment-17189698
 ] 

Sahil Takiar commented on IMPALA-10139:
---------------------------------------

This is pretty easy to reproduce on master. I just ran the query "select * from 
functional.alltypes as a2, functional.alltypes as a1" and didn't fetch any 
results. A bunch of RPCs get sent, but are not processed because queues are 
probably full. Then the logs contain entries like:
{code:java}
I0902 13:25:34.797029 17168 rpcz_store.cc:269] Call 
impala.DataStreamService.TransmitData from 127.0.0.1:33354 (request call id 
6737) took 218496ms. Request Metrics: {}
I0902 13:25:34.797061 17168 rpcz_store.cc:273] Trace:
0902 13:21:56.300996 (+     0us) impala-service-pool.cc:170] Inserting onto 
call queue
0902 13:21:56.301037 (+    41us) impala-service-pool.cc:269] Handling call
0902 13:21:56.301048 (+    11us) krpc-data-stream-recvr.cc:325] Enqueuing 
deferred RPC
0902 13:25:34.757315 (+218456267us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.757317 (+     2us) krpc-data-stream-recvr.cc:524] Batch queue is 
full
0902 13:25:34.757319 (+     2us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.757320 (+     1us) krpc-data-stream-recvr.cc:524] Batch queue is 
full
0902 13:25:34.796800 (+ 39480us) krpc-data-stream-recvr.cc:504] Processing 
deferred RPC
0902 13:25:34.796803 (+     3us) krpc-data-stream-recvr.cc:397] Deserializing 
batch
0902 13:25:34.797011 (+   208us) krpc-data-stream-recvr.cc:424] Enqueuing 
deserialized batch
0902 13:25:34.797021 (+    10us) inbound_call.cc:162] Queueing success response
Metrics: {}
I0902 13:25:34.797154 17105 krpc-data-stream-sender.cc:394] Slow TransmitData 
RPC to 127.0.0.1:27000 
(fragment_instance_id=d447645333af3b77:671fbefe00000000): took 3m38s. Receiver 
time: 3m38s Network time: 239.735us
I0902 13:25:34.797215  3684 krpc-data-stream-sender.cc:428] 
d447645333af3b77:671fbefe00000005] Long delay waiting for RPC to 
127.0.0.1:27000 (fragment_instance_id=d447645333af3b77:671fbefe00000000): took 
3m38s {code}

> Slow RPC logs can be misleading
> -------------------------------
>
>                 Key: IMPALA-10139
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10139
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Sahil Takiar
>            Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to 
> successfully complete a RPC. The issue is that there are many reasons why an 
> RPC might take a long time to complete. An RPC is considered complete only 
> when the receiver has processed that RPC. 
> The problem is that due to client-driven back-pressure mechanism, it is 
> entirely possible that the receiver RPC does not process a receiver RPC 
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been 
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not 
> actually be slow themselves. What is worse is that the because of the 
> back-pressure mechanism, slowness from the client (e.g. Hue users) will 
> propagate across all nodes involved in the query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to