Hi Peter, we're using a part of Crail - it's core library, called disni ( https://github.com/zrlio/disni/). We couldn't reproduce results from that blog post, any case Crail is more platformic approach (it comes with it's own file system), while SparkRdma is a pluggable approach - it's just a plugin, that you can enable/disable for a particular workload, you can use any hadoop vendor, etc.
The best optimization for shuffle between local jvms could be using something like short circuit local read ( https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html) to use unix socket for local communication or just directly read a part from other's jvm shuffle file. But yes, it's not available in spark out of box. Thanks, Peter Rudenko пт, 19 жовт. 2018 о 16:54 Peter Liu <peter.p...@gmail.com> пише: > Hi Peter, > > thank you for the reply and detailed information! Would this something > comparable with Crail? ( > http://crail.incubator.apache.org/blog/2017/11/rdmashuffle.html) > I was more looking for something simple/quick making the shuffle between > the local jvms quicker (like the idea of using local ram disk) for my > simple use case. > > of course, a general and thorough implementation should cover the shuffle > between the nodes as major focus. hmm, looks like there is no > implementation within spark itself yet. > > very much appreciated! > > Peter > > On Fri, Oct 19, 2018 at 9:38 AM Peter Rudenko <petro.rude...@gmail.com> > wrote: > >> Hey Peter, in SparkRDMA shuffle plugin ( >> https://github.com/Mellanox/SparkRDMA) we're using mmap of shuffle file, >> to do Remote Direct Memory Access. If the shuffle data is bigger then RAM, >> Mellanox NIC support On Demand Paging, where OS invalidates translations >> which are no longer valid due to either non-present pages or mapping >> changes. So if you have an RDMA capable NIC (or you can try on Azure cloud >> >> https://azure.microsoft.com/en-us/blog/introducing-the-new-hb-and-hc-azure-vm-sizes-for-hpc/ >> ), have a try. For network intensive apps you should get better >> performance. >> >> Thanks, >> Peter Rudenko >> >> чт, 18 жовт. 2018 о 18:07 Peter Liu <peter.p...@gmail.com> пише: >> >>> I would be very interested in the initial question here: >>> >>> is there a production level implementation for memory only shuffle and >>> configurable (similar to MEMORY_ONLY storage level, MEMORY_OR_DISK >>> storage level) as mentioned in this ticket, >>> https://github.com/apache/spark/pull/5403 ? >>> >>> It would be a quite practical and useful option/feature. not sure what >>> is the status of this ticket implementation? >>> >>> Thanks! >>> >>> Peter >>> >>> On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ravishankar.n...@gmail.com> >>> wrote: >>> >>>> Thanks..great info. Will try and let all know. >>>> >>>> Best >>>> >>>> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester < >>>> onmstes...@zoho.com> wrote: >>>> >>>>> create the ramdisk: >>>>> mount tmpfs /mnt/spark -t tmpfs -o size=2G >>>>> >>>>> then point spark.local.dir to the ramdisk, which depends on your >>>>> deployment strategy, for me it was through SparkConf object before passing >>>>> it to SparkContext: >>>>> conf.set("spark.local.dir","/mnt/spark") >>>>> >>>>> To validate that spark is actually using your ramdisk (by default it >>>>> uses /tmp), ls the ramdisk after running some jobs and you should see >>>>> spark >>>>> directories (with date on directory name) on your ramdisk >>>>> >>>>> >>>>> Sent using Zoho Mail <https://www.zoho.com/mail/> >>>>> >>>>> >>>>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair >>>>> <ravishankar.n...@gmail.com <ravishankar.n...@gmail.com>>* wrote ---- >>>>> >>>>> What are the steps to configure this? Thanks >>>>> >>>>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester < >>>>> onmstes...@zoho.com.invalid> wrote: >>>>> >>>>> >>>>> Hi, >>>>> I failed to config spark for in-memory shuffle so currently just >>>>> using linux memory mapped directory (tmpfs) as working directory of spark, >>>>> so everything is fast >>>>> >>>>> Sent using Zoho Mail <https://www.zoho.com/mail/> >>>>> >>>>> >>>>> >>>>>