Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

Saisai Shao Wed, 24 Aug 2016 07:15:51 -0700

Also fuse is another candidate (https://wiki.apache.org/hadoop/MountableHDFS),
but not so stable as I tried before.


On Wed, Aug 24, 2016 at 10:09 PM, Sun Rui <sunrise_...@163.com> wrote:

> For HDFS, maybe you can try mount HDFS as NFS. But not sure about the
> stability, and also there is additional overhead of network I/O and replica
> of HDFS files.
>
> On Aug 24, 2016, at 21:02, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
> Spark Shuffle uses Java File related API to create local dirs and R/W
> data, so it can only be worked with OS supported FS. It doesn't leverage
> Hadoop FileSystem API, so writing to Hadoop compatible FS is not worked.
>
> Also it is not suitable to write temporary shuffle data into distributed
> FS, this will bring unnecessary overhead. In you case if you have large
> memory on each node, you could use ramfs instead to store shuffle data.
>
> Thanks
> Saisai
>
> On Wed, Aug 24, 2016 at 8:11 PM, tony....@tendcloud.com <
> tony....@tendcloud.com> wrote:
>
>> Hi, All,
>> When we run Spark on very large data, spark will do shuffle and the
>> shuffle data will write to local disk. Because we have limited capacity at
>> local disk, the shuffled data will occupied all of the local disk and then
>> will be failed.  So is there a way we can write the shuffle spill data to
>> HDFS? Or if we introduce alluxio in our system, can the shuffled data write
>> to alluxio?
>>
>> Thanks and Regards,
>>
>> ------------------------------
>> 阎志涛(Tony)
>>
>> 北京腾云天下科技有限公司
>> -------------------------------------------------------------------------
>> -------------------------------
>> 邮箱：tony....@tendcloud.com
>> 电话：13911815695
>> 微信： zhitao_yan
>> QQ ： 4707059
>> 地址：北京市东城区东直门外大街39号院2号楼航空服务大厦602室
>> 邮编：100027
>> ------------------------------------------------------------
>> --------------------------------------------
>> TalkingData.com <http://talkingdata.com/> - 让数据说话
>>
>
>
>

Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

Reply via email to