[
https://issues.apache.org/jira/browse/IMPALA-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189698#comment-17189698
]
Sahil Takiar commented on IMPALA-10139:
---------------------------------------
This is pretty easy to reproduce on master. I just ran the query "select * from
functional.alltypes as a2, functional.alltypes as a1" and didn't fetch any
results. A bunch of RPCs get sent, but are not processed because queues are
probably full. Then the logs contain entries like:
{code:java}
I0902 13:25:34.797029 17168 rpcz_store.cc:269] Call
impala.DataStreamService.TransmitData from 127.0.0.1:33354 (request call id
6737) took 218496ms. Request Metrics: {}
I0902 13:25:34.797061 17168 rpcz_store.cc:273] Trace:
0902 13:21:56.300996 (+ 0us) impala-service-pool.cc:170] Inserting onto
call queue
0902 13:21:56.301037 (+ 41us) impala-service-pool.cc:269] Handling call
0902 13:21:56.301048 (+ 11us) krpc-data-stream-recvr.cc:325] Enqueuing
deferred RPC
0902 13:25:34.757315 (+218456267us) krpc-data-stream-recvr.cc:504] Processing
deferred RPC
0902 13:25:34.757317 (+ 2us) krpc-data-stream-recvr.cc:524] Batch queue is
full
0902 13:25:34.757319 (+ 2us) krpc-data-stream-recvr.cc:504] Processing
deferred RPC
0902 13:25:34.757320 (+ 1us) krpc-data-stream-recvr.cc:524] Batch queue is
full
0902 13:25:34.796800 (+ 39480us) krpc-data-stream-recvr.cc:504] Processing
deferred RPC
0902 13:25:34.796803 (+ 3us) krpc-data-stream-recvr.cc:397] Deserializing
batch
0902 13:25:34.797011 (+ 208us) krpc-data-stream-recvr.cc:424] Enqueuing
deserialized batch
0902 13:25:34.797021 (+ 10us) inbound_call.cc:162] Queueing success response
Metrics: {}
I0902 13:25:34.797154 17105 krpc-data-stream-sender.cc:394] Slow TransmitData
RPC to 127.0.0.1:27000
(fragment_instance_id=d447645333af3b77:671fbefe00000000): took 3m38s. Receiver
time: 3m38s Network time: 239.735us
I0902 13:25:34.797215 3684 krpc-data-stream-sender.cc:428]
d447645333af3b77:671fbefe00000005] Long delay waiting for RPC to
127.0.0.1:27000 (fragment_instance_id=d447645333af3b77:671fbefe00000000): took
3m38s {code}
> Slow RPC logs can be misleading
> -------------------------------
>
> Key: IMPALA-10139
> URL: https://issues.apache.org/jira/browse/IMPALA-10139
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Sahil Takiar
> Priority: Major
>
> The slow RPC logs added in IMPALA-9128 are based on the total time taken to
> successfully complete a RPC. The issue is that there are many reasons why an
> RPC might take a long time to complete. An RPC is considered complete only
> when the receiver has processed that RPC.
> The problem is that due to client-driven back-pressure mechanism, it is
> entirely possible that the receiver RPC does not process a receiver RPC
> because {{KrpcDataStreamRecvr::SenderQueue::GetBatch}} just hasn't been
> called yet (indirectly called by {{ExchangeNode::GetNext}}).
> This can lead to flood of slow RPC logs, even though the RPCs might not
> actually be slow themselves. What is worse is that the because of the
> back-pressure mechanism, slowness from the client (e.g. Hue users) will
> propagate across all nodes involved in the query.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]