I’m not sure about 1.2TB, but I can give it a shot.
Is there some way to persist intermediate results to disk? Does all the graph
has to be in memory?
Alex
From: Ankur Dave [mailto:ankurd...@gmail.com]
Sent: Monday, May 26, 2014 12:23 AM
To: user@spark.apache.org
Subject: Re: GraphX partition
Thanks for the comments. Any ideas how to search for the problem? As even a
smallish data set (something which I can run locally) fails in a cluster I
don't expect this to be a memory problem, for the same reason data/code
related problems don't feel reasonable. Could it be something with
Hi all,
I am trying to run spark over a MapR cluster. I successfully ran several
custom applications on a previous non-mapr hadoop cluster but i can't get
them working on the mapr one. To be more specific, i am not able to read or
write on mfs without running into a serialization error from Java.
Hi all,I want to implement a basic K-nearest neighbors search in Spark, but I
am totally new to Scala so don't know where to start with.My data consists
of millions of points. For each point, I need to compute its Euclidean
distance to the other points, and return the top-K points that are closest
Hello,
I currently have a task always failing with java.io.FileNotFoundException:
[...]/shuffle_0_257_2155 (Too many open files) when I run sorting
operations such as distinct, sortByKey, or reduceByKey on a large number of
partitions.
Im working with 365 GB of data which is being split into
Can we do better with Bagel somehow? Control how we store the graph?
From: Ankur Dave [mailto:ankurd...@gmail.com]
Sent: Monday, May 26, 2014 12:13 PM
To: user@spark.apache.org
Subject: Re: GraphX partition problem
GraphX only performs sequential scans over the edges, so we could in theory
Did you try in standalone mode. You may not see serialization issues in
local threaded mode.
Serialization errors are unlikely to be cause of Mapr hadoop version.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
Hi Spark Users,
What are the restrictions for using more then one spark contexts in a
single scala application? I did not see any documented limitations, but we
did observe some bad behavior when trying to do this. The one I'm hitting
now is that if I create a local context, stop it and then
My team is successfully running on Spark on MapR.
However, we add the mapr jars to the SPARK_CLASSAPTH on the workers, as
well as making sure they are on the classpath of the driver.
I'm not sure if we need every jar that we currently add but below is what
we currently use. The important file in
thanks for replying guys.
Mayur:
Indeed i tried the local mode (sparkmaster: local[5]) before and the
application runs well, no serialization problem. The problem arises as soon
as i try to run the app over the cluster.
Surendranauth:
I just double checked my spark_classpath from spark_env.sh
We use the mapr rpm and have successfully read and written hdfs data.
Are you using custom readers/writers? Maybe the relevant stacktrace might
help.
Maybe also try a standard text reader and writer to see if there is a basic
issue with accessing mfs?
-Suren
On Mon, May 26, 2014 at 11:31 AM,
When I have stack traces, I usually see the MapR versions of the various
hadoop classes, though maybe that's at a deeper level of the stack trace.
If my memory is right though, this may point to the classpath having
regular hadoop jars before the standard hadoop jars. My guess is that this
is on
Hi Roger,
This was due to a bug in the Spark shell code, and is fixed in the latest
master (and RC11). Here is the commit that fixed it:
https://github.com/apache/spark/commit/8edbee7d1b4afc192d97ba192a5526affc464205.
Try it now and it should work. :)
Andrew
2014-05-26 10:35 GMT+02:00 Perttu
Excellent thanks Ankur, looks like what im looking for Only one problem the
line
val dists = initDists.pregel[DistanceMap](Map())(vprog, sendMsg, mergeMsg)
produces an error
Job aborted: Task 268.0:5 had a not serializable result:
java.io.NotSerializableException:
Oh, looks like the Scala Map isn't serializable. I switched the code to use
java.util.HashMap, which should work.
Ankur http://www.ankurdave.com/
On Mon, May 26, 2014 at 3:21 PM, daze5112 david.zeel...@ato.gov.au wrote:
Excellent thanks Ankur, looks like what im looking for Only one problem
On closer inspection it looks like Map normally is serializable, and it's
just a bug in mapValues, so I changed to using the .map(identity)
workaround described in https://issues.scala-lang.org/browse/SI-7005.
Ankur http://www.ankurdave.com/
hi
On May 26, 2014, at 06:48 PM, nelson nelson.verd...@ysance.com wrote:
The test application is built using sbt with the following dependency:
- org.apache.spark spark-core 0.9.1
you need to remove this dependency. sbt will pack non-mapr version
of hadoop classes.
17 matches
Mail list logo