For what it's worth, my data set has around 85 columns in Parquet format as
well. I have tried bumping the permgen up to 512m but I'm still getting
errors in the driver thread.
On Wed, Jul 22, 2015 at 1:20 PM, Jerry Lam chiling...@gmail.com wrote:
Hi guys,
I noticed that too. Anders, can you
--driver-memory 5g
Let me know if that answers your question,
-Andrew
2015-07-22 12:38 GMT-07:00 Michael Misiewicz mmisiew...@gmail.com:
Hi group,
I seem to have encountered a weird problem with 'spark-submit' and
manually setting sparkconf values in my applications.
It seems like setting
Hi group,
I seem to have encountered a weird problem with 'spark-submit' and manually
setting sparkconf values in my applications.
It seems like setting the configuration values spark.executor.memory
and spark.driver.memory don't have any effect, when they are set from
within my application
Hi Anders,
Did you ever get to the bottom of this issue? I'm encountering it too, but
only in yarn-cluster mode running on spark 1.4.0. I was thinking of
trying 1.4.1 today.
Michael
On Thu, Jun 25, 2015 at 5:52 AM, Anders Arpteg arp...@spotify.com wrote:
Yes, both the driver and the
is a single Map object rather than an RDD.
If this map can be very large (say you have billions of users), then
aggregate may OOM.
On 10/17/14 12:01 AM, Michael Misiewicz wrote:
Thanks for the suggestion! That does look really helpful, I see what
you mean about it being more general than fold. I
) = (time, user) - amount
}.sortByKey.aggregate(…)
On 10/17/14 10:44 PM, Michael Misiewicz wrote:
Thank you for sharing this Cheng! This is fantastic. I was able to
implement it and it seems like it's working quite well. I'm definitely on
the right track now!
I'm still having a small
Hi,
I'm working on a problem where I'd like to sum items in an RDD *in order (*
approximately*)*. I am currently trying to implement this using a fold, but
I'm having some issues because the sorting key of my data is not the same
as the folding key for my data. I have data that looks like this:
Michael,
I'm not sure I fully understood your question, but I think RDD.aggregate
can be helpful in your case. You can see it as a more general version of
fold.
Cheng
On 10/16/14 11:15 PM, Michael Misiewicz wrote:
Hi,
I'm working on a problem where I'd like to sum items in an RDD
partitioner and pass
that into aggregateByKey on the re-keyed data being aggregated?
On Thu, Oct 16, 2014 at 12:01 PM, Michael Misiewicz mmisiew...@gmail.com
wrote:
Thanks for the suggestion! That does look really helpful, I see what you
mean about it being more general than fold. I think I