Re: Parquet problems

2015-07-22 Thread Michael Misiewicz
For what it's worth, my data set has around 85 columns in Parquet format as well. I have tried bumping the permgen up to 512m but I'm still getting errors in the driver thread. On Wed, Jul 22, 2015 at 1:20 PM, Jerry Lam chiling...@gmail.com wrote: Hi guys, I noticed that too. Anders, can you

Re: spark.executor.memory and spark.driver.memory have no effect in yarn-cluster mode (1.4.x)?

2015-07-22 Thread Michael Misiewicz
--driver-memory 5g Let me know if that answers your question, -Andrew 2015-07-22 12:38 GMT-07:00 Michael Misiewicz mmisiew...@gmail.com: Hi group, I seem to have encountered a weird problem with 'spark-submit' and manually setting sparkconf values in my applications. It seems like setting

spark.executor.memory and spark.driver.memory have no effect in yarn-cluster mode (1.4.x)?

2015-07-22 Thread Michael Misiewicz
Hi group, I seem to have encountered a weird problem with 'spark-submit' and manually setting sparkconf values in my applications. It seems like setting the configuration values spark.executor.memory and spark.driver.memory don't have any effect, when they are set from within my application

Re: Parquet problems

2015-07-22 Thread Michael Misiewicz
Hi Anders, Did you ever get to the bottom of this issue? I'm encountering it too, but only in yarn-cluster mode running on spark 1.4.0. I was thinking of trying 1.4.1 today. Michael On Thu, Jun 25, 2015 at 5:52 AM, Anders Arpteg arp...@spotify.com wrote: Yes, both the driver and the

Re: Folding an RDD in order

2014-10-17 Thread Michael Misiewicz
is a single Map object rather than an RDD. If this map can be very large (say you have billions of users), then aggregate may OOM. On 10/17/14 12:01 AM, Michael Misiewicz wrote: Thanks for the suggestion! That does look really helpful, I see what you mean about it being more general than fold. I

Re: Folding an RDD in order

2014-10-17 Thread Michael Misiewicz
) = (time, user) - amount }.sortByKey.aggregate(…) On 10/17/14 10:44 PM, Michael Misiewicz wrote: Thank you for sharing this Cheng! This is fantastic. I was able to implement it and it seems like it's working quite well. I'm definitely on the right track now! I'm still having a small

Folding an RDD in order

2014-10-16 Thread Michael Misiewicz
Hi, I'm working on a problem where I'd like to sum items in an RDD *in order (* approximately*)*. I am currently trying to implement this using a fold, but I'm having some issues because the sorting key of my data is not the same as the folding key for my data. I have data that looks like this:

Re: Folding an RDD in order

2014-10-16 Thread Michael Misiewicz
Michael, I'm not sure I fully understood your question, but I think RDD.aggregate can be helpful in your case. You can see it as a more general version of fold. Cheng On 10/16/14 11:15 PM, Michael Misiewicz wrote: Hi, I'm working on a problem where I'd like to sum items in an RDD

Re: Folding an RDD in order

2014-10-16 Thread Michael Misiewicz
partitioner and pass that into aggregateByKey on the re-keyed data being aggregated? On Thu, Oct 16, 2014 at 12:01 PM, Michael Misiewicz mmisiew...@gmail.com wrote: Thanks for the suggestion! That does look really helpful, I see what you mean about it being more general than fold. I think I