I would be very interested in the initial question here: is there a production level implementation for memory only shuffle and configurable (similar to MEMORY_ONLY storage level, MEMORY_OR_DISK storage level) as mentioned in this ticket, https://github.com/apache/spark/pull/5403 ?
It would be a quite practical and useful option/feature. not sure what is the status of this ticket implementation? Thanks! Peter On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <ravishankar.n...@gmail.com> wrote: > Thanks..great info. Will try and let all know. > > Best > > On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <onmstes...@zoho.com> > wrote: > >> create the ramdisk: >> mount tmpfs /mnt/spark -t tmpfs -o size=2G >> >> then point spark.local.dir to the ramdisk, which depends on your >> deployment strategy, for me it was through SparkConf object before passing >> it to SparkContext: >> conf.set("spark.local.dir","/mnt/spark") >> >> To validate that spark is actually using your ramdisk (by default it uses >> /tmp), ls the ramdisk after running some jobs and you should see spark >> directories (with date on directory name) on your ramdisk >> >> >> Sent using Zoho Mail <https://www.zoho.com/mail/> >> >> >> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair >> <ravishankar.n...@gmail.com <ravishankar.n...@gmail.com>>* wrote ---- >> >> What are the steps to configure this? Thanks >> >> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester < >> onmstes...@zoho.com.invalid> wrote: >> >> >> Hi, >> I failed to config spark for in-memory shuffle so currently just >> using linux memory mapped directory (tmpfs) as working directory of spark, >> so everything is fast >> >> Sent using Zoho Mail <https://www.zoho.com/mail/> >> >> >> >>