[jira] [Commented] (IMPALA-6294) Concurrent hung with lots of spilling make slow progress due to blocking in DataStreamRecvr and DataStreamSender

Quanlong Huang (Jira) Fri, 11 Jun 2021 00:29:18 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361492#comment-17361492
 ]


Quanlong Huang commented on IMPALA-6294:
----------------------------------------

FWIW, IMPALA-10578 is a similar issue but finally found the cause is a poor 
configuration that only one rotational disk is configed for spilling and that 
disk is also used by logging. The spilling saturates the disk so block loggings 
and finally block RPCs.

> Concurrent hung with lots of spilling make slow progress due to blocking in 
> DataStreamRecvr and DataStreamSender
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6294
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6294
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Michael Ho
>            Priority: Critical
>         Attachments: IMPALA-6285 TPCDS Q3 slow broadcast, 
> slow_broadcast_q3_reciever.txt, slow_broadcast_q3_sender.txt
>
>
> While running a highly concurrent spilling workload on a large cluster 
> queries start running slower, even light weight queries that are not running 
> are affected by this slow down. 
> {code}
>           EXCHANGE_NODE (id=9):(Total: 3m1s, non-child: 3m1s, % non-child: 
> 100.00%)
>              - ConvertRowBatchTime: 999.990us
>              - PeakMemoryUsage: 0
>              - RowsReturned: 108.00K (108001)
>              - RowsReturnedRate: 593.00 /sec
>             DataStreamReceiver:
>               BytesReceived(4s000ms): 254.47 KB, 338.82 KB, 338.82 KB, 852.43 
> KB, 1.32 MB, 1.33 MB, 1.50 MB, 2.53 MB, 2.99 MB, 3.00 MB, 3.00 MB, 3.00 MB, 
> 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.16 MB, 3.49 MB, 3.80 
> MB, 4.15 MB, 4.55 MB, 4.84 MB, 4.99 MB, 5.07 MB, 5.41 MB, 5.75 MB, 5.92 MB, 
> 6.00 MB, 6.00 MB, 6.00 MB, 6.07 MB, 6.28 MB, 6.33 MB, 6.43 MB, 6.67 MB, 6.91 
> MB, 7.29 MB, 8.03 MB, 9.12 MB, 9.68 MB, 9.90 MB, 9.97 MB, 10.44 MB, 11.25 MB
>                - BytesReceived: 11.73 MB (12301692)
>                - DeserializeRowBatchTimer: 957.990ms
>                - FirstBatchArrivalWaitTime: 0.000ns
>                - PeakMemoryUsage: 644.44 KB (659904)
>                - SendersBlockedTimer: 0.000ns
>                - SendersBlockedTotalTimer(*): 0.000ns
> {code}
> {code}
>         DataStreamSender (dst_id=9):(Total: 1s819ms, non-child: 1s819ms, % 
> non-child: 100.00%)
>            - BytesSent: 234.64 MB (246033840)
>            - NetworkThroughput(*): 139.58 MB/sec
>            - OverallThroughput: 128.92 MB/sec
>            - PeakMemoryUsage: 33.12 KB (33920)
>            - RowsReturned: 108.00K (108001)
>            - SerializeBatchTime: 133.998ms
>            - TransmitDataRPCTime: 1s680ms
>            - UncompressedRowBatchSize: 446.42 MB (468102200)
> {code}
> Timeouts seen in IMPALA-6285 are caused by this issue
> {code}
> I1206 12:44:14.925405 25274 status.cc:58] RPC recv timed out: Client 
> foo-17.domain.com:22000 timed-out during recv call.
>     @           0x957a6a  impala::Status::Status()
>     @          0x11dd5fe  
> impala::DataStreamSender::Channel::DoTransmitDataRpc()
>     @          0x11ddcd4  
> impala::DataStreamSender::Channel::TransmitDataHelper()
>     @          0x11de080  impala::DataStreamSender::Channel::TransmitData()
>     @          0x11e1004  impala::ThreadPool<>::WorkerThread()
>     @           0xd10063  impala::Thread::SuperviseThread()
>     @           0xd107a4  boost::detail::thread_data<>::run()
>     @          0x128997a  (unknown)
>     @     0x7f68c5bc7e25  start_thread
>     @     0x7f68c58f534d  __clone
> {code}
> A similar behavior was also observed with KRPC enabled IMPALA-6048



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-6294) Concurrent hung with lots of spilling make slow progress due to blocking in DataStreamRecvr and DataStreamSender

Reply via email to