Re: SparkR Jobs Hanging in collectPartitions

2015-05-29 Thread Eskilson,Aleksander
@spark.apache.orgmailto:user@spark.apache.org Subject: Re: SparkR Jobs Hanging in collectPartitions Could you try to see which phase is causing the hang ? i.e. If you do a count() after flatMap does that work correctly ? My guess is that the hang is somehow related to data not fitting in the R process

Re: SparkR Jobs Hanging in collectPartitions

2015-05-29 Thread Shivaram Venkataraman
Date: Wednesday, May 27, 2015 at 8:26 PM To: Aleksander Eskilson alek.eskil...@cerner.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: SparkR Jobs Hanging in collectPartitions Could you try to see which phase is causing the hang ? i.e. If you do a count() after flatMap does

Re: SparkR Jobs Hanging in collectPartitions

2015-05-27 Thread Shivaram Venkataraman
Could you try to see which phase is causing the hang ? i.e. If you do a count() after flatMap does that work correctly ? My guess is that the hang is somehow related to data not fitting in the R process memory but its hard to say without more diagnostic information. Thanks Shivaram On Tue, May

SparkR Jobs Hanging in collectPartitions

2015-05-26 Thread Eskilson,Aleksander
I’ve been attempting to run a SparkR translation of a similar Scala job that identifies words from a corpus not existing in a newline delimited dictionary. The R code is: dict - SparkR:::textFile(sc, src1) corpus - SparkR:::textFile(sc, src2) words - distinct(SparkR:::flatMap(corpus,