Also fuse is another candidate (https://wiki.apache.org/hadoop/MountableHDFS), but not so stable as I tried before.
On Wed, Aug 24, 2016 at 10:09 PM, Sun Rui <sunrise_...@163.com> wrote: > For HDFS, maybe you can try mount HDFS as NFS. But not sure about the > stability, and also there is additional overhead of network I/O and replica > of HDFS files. > > On Aug 24, 2016, at 21:02, Saisai Shao <sai.sai.s...@gmail.com> wrote: > > Spark Shuffle uses Java File related API to create local dirs and R/W > data, so it can only be worked with OS supported FS. It doesn't leverage > Hadoop FileSystem API, so writing to Hadoop compatible FS is not worked. > > Also it is not suitable to write temporary shuffle data into distributed > FS, this will bring unnecessary overhead. In you case if you have large > memory on each node, you could use ramfs instead to store shuffle data. > > Thanks > Saisai > > On Wed, Aug 24, 2016 at 8:11 PM, tony....@tendcloud.com < > tony....@tendcloud.com> wrote: > >> Hi, All, >> When we run Spark on very large data, spark will do shuffle and the >> shuffle data will write to local disk. Because we have limited capacity at >> local disk, the shuffled data will occupied all of the local disk and then >> will be failed. So is there a way we can write the shuffle spill data to >> HDFS? Or if we introduce alluxio in our system, can the shuffled data write >> to alluxio? >> >> Thanks and Regards, >> >> ------------------------------ >> 阎志涛(Tony) >> >> 北京腾云天下科技有限公司 >> ------------------------------------------------------------------------- >> ------------------------------- >> 邮箱:tony....@tendcloud.com >> 电话:13911815695 >> 微信: zhitao_yan >> QQ : 4707059 >> 地址:北京市东城区东直门外大街39号院2号楼航空服务大厦602室 >> 邮编:100027 >> ------------------------------------------------------------ >> -------------------------------------------- >> TalkingData.com <http://talkingdata.com/> - 让数据说话 >> > > >