@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkR Jobs Hanging in collectPartitions
Could you try to see which phase is causing the hang ? i.e. If you do a count()
after flatMap does that work correctly ? My guess is that the hang is somehow
related to data not fitting in the R process
Date: Wednesday, May 27, 2015 at 8:26 PM
To: Aleksander Eskilson alek.eskil...@cerner.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: SparkR Jobs Hanging in collectPartitions
Could you try to see which phase is causing the hang ? i.e. If you do a
count() after flatMap does
Could you try to see which phase is causing the hang ? i.e. If you do a
count() after flatMap does that work correctly ? My guess is that the hang
is somehow related to data not fitting in the R process memory but its hard
to say without more diagnostic information.
Thanks
Shivaram
On Tue, May
I’ve been attempting to run a SparkR translation of a similar Scala job that
identifies words from a corpus not existing in a newline delimited dictionary.
The R code is:
dict - SparkR:::textFile(sc, src1)
corpus - SparkR:::textFile(sc, src2)
words - distinct(SparkR:::flatMap(corpus,