Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Irving Duran
I don't think there is a magic number, so I would say that it will depend on how big your dataset is and the size of your worker(s). Thank You, Irving Duran On Sat, Apr 28, 2018 at 10:41 AM klrmowse wrote: > i am currently trying to find a workaround for the Spark

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Vadim Semenov
`.collect` returns an Array, and array's can't have more than Int.MaxValue elements, and in most JVMs it's lower: `Int.MaxValue - 8` So it puts upper limit, however, you can just create Array of Arrays, and so on, basically limitless, albeit with some gymnastics.

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Deepak Goel
AM. So, > if general guidelines are followed, **virtual memory** is moot. > > *From: *Deepak Goel <deic...@gmail.com> > *Date: *Saturday, April 28, 2018 at 12:58 PM > *To: *Stephen Boesch <java...@gmail.com> > *Cc: *klrmowse <klrmo...@gmail.com>, "user @s

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Lalwani, Jayesh
mo...@gmail.com>, "user @spark" <user@spark.apache.org> Subject: Re: [Spark 2.x Core] .collect() size limit I believe the virtualization of memory happens at the OS layer hiding it completely from the application layer On Sat, 28 Apr 2018, 22:22 Stephen Boesch, <java...@gmail

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Mark Hamstra
spark.driver.maxResultSize http://spark.apache.org/docs/latest/configuration.html On Sat, Apr 28, 2018 at 8:41 AM, klrmowse wrote: > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use .collect() > > but,

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel
I believe the virtualization of memory happens at the OS layer hiding it completely from the application layer On Sat, 28 Apr 2018, 22:22 Stephen Boesch, wrote: > While it is certainly possible to use VM I have seen in a number of places > warnings that collect() results must

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch
While it is certainly possible to use VM I have seen in a number of places warnings that collect() results must be able to be fit in memory. I'm not sure if that applies to *all" spark calculations: but in the very least each of the specific collect()'s that are performed would need to be

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel
There is something as *virtual memory* On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote: > Do you have a machine with terabytes of RAM? afaik collect() requires > RAM - so that would be your limiting factor. > > 2018-04-28 8:41 GMT-07:00 klrmowse : > >>

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch
Do you have a machine with terabytes of RAM? afaik collect() requires RAM - so that would be your limiting factor. 2018-04-28 8:41 GMT-07:00 klrmowse : > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use