[
https://issues.apache.org/jira/browse/SPARK-22229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752356#comment-16752356
]
Thomas Graves commented on SPARK-22229:
---------------------------------------
This is interesting, a few questions
* I'm assuming all the data has to fit into memory for this to work? Or is it
somehow handling spill files by pulling them into memory and then transferring?
Does it fail if its not all in memory?
* The benchmarks data size I saw seemed to all appear to fit into memory, is
that right?
* Did you performance test with both rdma over ethernet and infiniband?
* To clarify the above question, is the implementation in Mellanox/SparkRDMA
github stable or not yet complete?
* The SPIP mentions: MapStatuses are redundant – no need for those extra
transfers that take precious seconds in many job. -> How does reducer know
where to fetch map output from then? It still somehow needs to know a host and
perhaps memory location unless that host its fetching from just knows based on
mapid and reduceid.
* I assume this is only supported with external shuffle disabled (which
probably doesn't exist since you have different shuffle manager) and no dynamic
allocation?
* Depending on above questions, if its all in memory, I assume if executor
goes down it has to rerun those tasks since its not on disk for external
shuffle service to still serve up.
* If someone was to try this out, from the spip: "SparkRDMA manages its own
memory, off-heap", I take that to mean in addition to sparks normal memory
usage you need to give the spark executor enough off heap memory to account for
whatever size you are shuffling then?
> SPIP: RDMA Accelerated Shuffle Engine
> -------------------------------------
>
> Key: SPARK-22229
> URL: https://issues.apache.org/jira/browse/SPARK-22229
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.3.0, 2.4.0, 3.0.0
> Reporter: Yuval Degani
> Priority: Major
> Attachments:
> SPARK-22229_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O
> processing overhead by bypassing the kernel and networking stack as well as
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed
> directly by the actual Spark workloads, and help reducing the job runtime
> significantly.
> This performance gain is demonstrated with both industry standard HiBench
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive
> customer applications.
> SparkRDMA will be presented at Spark Summit 2017 in Dublin
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]