[
https://issues.apache.org/jira/browse/IMPALA-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361492#comment-17361492
]
Quanlong Huang commented on IMPALA-6294:
----------------------------------------
FWIW, IMPALA-10578 is a similar issue but finally found the cause is a poor
configuration that only one rotational disk is configed for spilling and that
disk is also used by logging. The spilling saturates the disk so block loggings
and finally block RPCs.
> Concurrent hung with lots of spilling make slow progress due to blocking in
> DataStreamRecvr and DataStreamSender
> ----------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-6294
> URL: https://issues.apache.org/jira/browse/IMPALA-6294
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.11.0
> Reporter: Mostafa Mokhtar
> Assignee: Michael Ho
> Priority: Critical
> Attachments: IMPALA-6285 TPCDS Q3 slow broadcast,
> slow_broadcast_q3_reciever.txt, slow_broadcast_q3_sender.txt
>
>
> While running a highly concurrent spilling workload on a large cluster
> queries start running slower, even light weight queries that are not running
> are affected by this slow down.
> {code}
> EXCHANGE_NODE (id=9):(Total: 3m1s, non-child: 3m1s, % non-child:
> 100.00%)
> - ConvertRowBatchTime: 999.990us
> - PeakMemoryUsage: 0
> - RowsReturned: 108.00K (108001)
> - RowsReturnedRate: 593.00 /sec
> DataStreamReceiver:
> BytesReceived(4s000ms): 254.47 KB, 338.82 KB, 338.82 KB, 852.43
> KB, 1.32 MB, 1.33 MB, 1.50 MB, 2.53 MB, 2.99 MB, 3.00 MB, 3.00 MB, 3.00 MB,
> 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.16 MB, 3.49 MB, 3.80
> MB, 4.15 MB, 4.55 MB, 4.84 MB, 4.99 MB, 5.07 MB, 5.41 MB, 5.75 MB, 5.92 MB,
> 6.00 MB, 6.00 MB, 6.00 MB, 6.07 MB, 6.28 MB, 6.33 MB, 6.43 MB, 6.67 MB, 6.91
> MB, 7.29 MB, 8.03 MB, 9.12 MB, 9.68 MB, 9.90 MB, 9.97 MB, 10.44 MB, 11.25 MB
> - BytesReceived: 11.73 MB (12301692)
> - DeserializeRowBatchTimer: 957.990ms
> - FirstBatchArrivalWaitTime: 0.000ns
> - PeakMemoryUsage: 644.44 KB (659904)
> - SendersBlockedTimer: 0.000ns
> - SendersBlockedTotalTimer(*): 0.000ns
> {code}
> {code}
> DataStreamSender (dst_id=9):(Total: 1s819ms, non-child: 1s819ms, %
> non-child: 100.00%)
> - BytesSent: 234.64 MB (246033840)
> - NetworkThroughput(*): 139.58 MB/sec
> - OverallThroughput: 128.92 MB/sec
> - PeakMemoryUsage: 33.12 KB (33920)
> - RowsReturned: 108.00K (108001)
> - SerializeBatchTime: 133.998ms
> - TransmitDataRPCTime: 1s680ms
> - UncompressedRowBatchSize: 446.42 MB (468102200)
> {code}
> Timeouts seen in IMPALA-6285 are caused by this issue
> {code}
> I1206 12:44:14.925405 25274 status.cc:58] RPC recv timed out: Client
> foo-17.domain.com:22000 timed-out during recv call.
> @ 0x957a6a impala::Status::Status()
> @ 0x11dd5fe
> impala::DataStreamSender::Channel::DoTransmitDataRpc()
> @ 0x11ddcd4
> impala::DataStreamSender::Channel::TransmitDataHelper()
> @ 0x11de080 impala::DataStreamSender::Channel::TransmitData()
> @ 0x11e1004 impala::ThreadPool<>::WorkerThread()
> @ 0xd10063 impala::Thread::SuperviseThread()
> @ 0xd107a4 boost::detail::thread_data<>::run()
> @ 0x128997a (unknown)
> @ 0x7f68c5bc7e25 start_thread
> @ 0x7f68c58f534d __clone
> {code}
> A similar behavior was also observed with KRPC enabled IMPALA-6048
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]