Re: Spark In Memory Shuffle / 5403

Peter Liu Thu, 18 Oct 2018 08:08:04 -0700

I would be very interested in the initial question here:

is there a production level implementation for memory only shuffle and
configurable (similar to  MEMORY_ONLY storage level,  MEMORY_OR_DISK
storage level) as mentioned in this ticket,
https://github.com/apache/spark/pull/5403 ?


It would be a quite practical and useful option/feature. not sure what is
the status of this ticket implementation?

Thanks!

Peter

On Thu, Oct 18, 2018 at 6:51 AM ☼ R Nair <[email protected]> wrote:

> Thanks..great info. Will try and let all know.
>
> Best
>
> On Thu, Oct 18, 2018, 3:12 AM onmstester onmstester <[email protected]>
> wrote:
>
>> create the ramdisk:
>> mount tmpfs /mnt/spark -t tmpfs -o size=2G
>>
>> then point spark.local.dir to the ramdisk, which depends on your
>> deployment strategy, for me it was through SparkConf object before passing
>> it to SparkContext:
>> conf.set("spark.local.dir","/mnt/spark")
>>
>> To validate that spark is actually using your ramdisk (by default it uses
>> /tmp), ls the ramdisk after running some jobs and you should see spark
>> directories (with date on directory name) on your ramdisk
>>
>>
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>
>>
>> ---- On Wed, 17 Oct 2018 18:57:14 +0330 *☼ R Nair
>> <[email protected] <[email protected]>>* wrote ----
>>
>> What are the steps to configure this? Thanks
>>
>> On Wed, Oct 17, 2018, 9:39 AM onmstester onmstester <
>> [email protected]> wrote:
>>
>>
>> Hi,
>> I failed to config spark for in-memory shuffle so currently just
>> using linux memory mapped directory (tmpfs) as working directory of spark,
>> so everything is fast
>>
>> Sent using Zoho Mail <https://www.zoho.com/mail/>
>>
>>
>>
>>

Re: Spark In Memory Shuffle / 5403

Reply via email to