RE: is there anyway to enforce Spark to cache data in all worker nodes(almost equally) ?

2015-04-30 Thread Alex
482 MB should be small enough to be distributed as a set of broadcast variables. Then you can use local features of spark to process. -Original Message- From: shahab shahab.mok...@gmail.com Sent: ‎4/‎30/‎2015 9:42 AM To: user@spark.apache.org user@spark.apache.org Subject: is there

is there anyway to enforce Spark to cache data in all worker nodes (almost equally) ?

2015-04-30 Thread shahab
Hi, I load data from Cassandra into spark The entire data is almost around 482 MB. and it is cached as TempTable in 7 tables. How can I enforce spark to cache data in both worker nodes not only in ONE worker (as in my case)? I am using spark 2.1.1 with spark-connector 1.2.0-rc3. I have small

Re: is there anyway to enforce Spark to cache data in all worker nodes(almost equally) ?

2015-04-30 Thread shahab
Thanks Alex, but 482MB was just example size, and I am looking for generic approach doing this without broadcasting, any idea? best, /Shahab On Thu, Apr 30, 2015 at 4:21 PM, Alex lxv...@gmail.com wrote: 482 MB should be small enough to be distributed as a set of broadcast variables. Then