[
https://issues.apache.org/jira/browse/IMPALA-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Ho updated IMPALA-6088:
-------------------------------
Parent Issue: IMPALA-7301 (was: IMPALA-5865)
> Rack aware broadcast operator
> -----------------------------
>
> Key: IMPALA-6088
> URL: https://issues.apache.org/jira/browse/IMPALA-6088
> Project: IMPALA
> Issue Type: Sub-task
> Components: Distributed Exec
> Reporter: Mostafa Mokhtar
> Priority: Major
>
> When conducting large scale experiments on a 6 rack cluster with aggregator
> core network topology overall cluster bandwidth utilization was limited.
> With aggregator core networks nodes and racks are not equidistant, which
> means a broadcast operation can be inefficient as the broadcasting node needs
> to send the same data N times to each node on a remote rack.
> Ideally Rowbatches should be sent once per remote rack then a node on each
> remote rack would broadcast within its rack.
> Table below represent rack to rack latency for the 90% of operations, ration
> between best and worst case is 7.3x
> | || va|| vc|| vd1|| vd3|| ve|
> |va| 4,238| 4,290| 9,692| 8,897| 8,208|
> |vc| 9,290| 4,396| 30,952| 13,529| 14,578|
> |vd1| 9,131| 29,066| 4,346| 17,265| 16,849|
> |vd3| 7,409| 15,517| 17,265| 4,370| 4,687|
> |ve| 4,914| 16,894| 16,430| 4,713| 4,472|
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]