I don't think there is a magic number, so I would say that it will depend
on how big your dataset is and the size of your worker(s).
Thank You,
Irving Duran
On Sat, Apr 28, 2018 at 10:41 AM klrmowse wrote:
> i am currently trying to find a workaround for the Spark
`.collect` returns an Array, and array's can't have more than Int.MaxValue
elements, and in most JVMs it's lower: `Int.MaxValue - 8`
So it puts upper limit, however, you can just create Array of Arrays, and
so on, basically limitless, albeit with some gymnastics.
AM. So,
> if general guidelines are followed, **virtual memory** is moot.
>
> *From: *Deepak Goel <deic...@gmail.com>
> *Date: *Saturday, April 28, 2018 at 12:58 PM
> *To: *Stephen Boesch <java...@gmail.com>
> *Cc: *klrmowse <klrmo...@gmail.com>, "user @s
mo...@gmail.com>, "user @spark" <user@spark.apache.org>
Subject: Re: [Spark 2.x Core] .collect() size limit
I believe the virtualization of memory happens at the OS layer hiding it
completely from the application layer
On Sat, 28 Apr 2018, 22:22 Stephen Boesch,
<java...@gmail
spark.driver.maxResultSize
http://spark.apache.org/docs/latest/configuration.html
On Sat, Apr 28, 2018 at 8:41 AM, klrmowse wrote:
> i am currently trying to find a workaround for the Spark application i am
> working on so that it does not have to use .collect()
>
> but,
I believe the virtualization of memory happens at the OS layer hiding it
completely from the application layer
On Sat, 28 Apr 2018, 22:22 Stephen Boesch, wrote:
> While it is certainly possible to use VM I have seen in a number of places
> warnings that collect() results must
While it is certainly possible to use VM I have seen in a number of places
warnings that collect() results must be able to be fit in memory. I'm not
sure if that applies to *all" spark calculations: but in the very least
each of the specific collect()'s that are performed would need to be
There is something as *virtual memory*
On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote:
> Do you have a machine with terabytes of RAM? afaik collect() requires
> RAM - so that would be your limiting factor.
>
> 2018-04-28 8:41 GMT-07:00 klrmowse :
>
>>
Do you have a machine with terabytes of RAM? afaik collect() requires RAM
- so that would be your limiting factor.
2018-04-28 8:41 GMT-07:00 klrmowse :
> i am currently trying to find a workaround for the Spark application i am
> working on so that it does not have to use