Re: SPARk-25299: Updates As Of December 19, 2018

Peter Rudenko Thu, 03 Jan 2019 05:24:10 -0800

Hi Matt, i'm a developer of SparkRDMA shuffle manager:
https://github.com/Mellanox/SparkRDMA
Thanks for your effort on improving Spark Shuffle API. We are very
interested in participating in this. Have for now several comments:
1. Went through these 4 documents:

https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit#
<https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit>

https://docs.google.com/document/d/1TA-gDw3ophy-gSu2IAW_5IMbRK_8pWBeXJwngN9YB80/edit

https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40

https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit#
<https://docs.google.com/document/d/1kSpbBB-sDk41LeORm3-Hfr-up98Ozm5wskvB49tUhSs/edit>
As i understood there's 2 discussions: improving shuffle manager API itself
(Splash manager) and improving external shuffle service
<https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.9o9f7nm01fz6>
2. We may consider to revisiting SPIP: RDMA Accelerated Shuffle Engine
<https://issues.apache.org/jira/browse/SPARK-22229> whether to support RDMA
in the main codebase or at least as a first-class shuffle plugin (there are
not much other open source shuffle plugins exists). We actively develop it,
adding new features. RDMA is now available on Azure (
https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/),
Alibaba  and other cloud providers. For now we support only memory <->
memory transfer, but rdma is extensible to NVM and GPU data transfer.
3. We have users that are interested in having this feature (
https://issues.apache.org/jira/browse/SPARK-12196) - we can consider adding
it to this new API.

Let me know if you need help in review / testing / benchmark.
I'll look more on documents and PR,

Thanks,
Peter Rudenko
Software engineer at Mellanox Technologies.

ср, 19 груд. 2018 о 20:54 John Zhuge <john.zh...@gmail.com> пише:

> Matt, appreciate the update!
>
> On Wed, Dec 19, 2018 at 10:51 AM Matt Cheah <mch...@palantir.com> wrote:
>
>> Hi everyone,
>>
>>
>>
>> Earlier this year, we proposed SPARK-25299
>> <https://issues.apache.org/jira/browse/SPARK-25299>, proposing the idea
>> of using other storage systems for persisting shuffle files. Since that
>> time, we have been continuing to work on prototypes for this project. In
>> the interest of increasing transparency into our work, we have created a 
>> progress
>> report document
>> <https://docs.google.com/document/d/1tglSkfblFhugcjFXZOxuKsCdxfrHBXfxgTs-sbbNB3c/edit?usp=sharing>
>> where you may find a summary of the work we have been doing, as well as
>> links to our prototypes on Github. We would ask that anyone who is very
>> familiar with the inner workings of Spark’s shuffle could provide feedback
>> and comments on our work thus far. We welcome any further discussion in
>> this space. You may comment in this e-mail thread or by commenting on the
>> progress report document.
>>
>>
>>
>> Looking forward to hearing from you. Thanks,
>>
>>
>>
>> -Matt Cheah
>>
>
>
> --
> John
>

Re: SPARk-25299: Updates As Of December 19, 2018

Reply via email to