RE: GraphX partition problem

2014-05-26 Thread Zhicharevich, Alex
I’m not sure about 1.2TB, but I can give it a shot. Is there some way to persist intermediate results to disk? Does all the graph has to be in memory? Alex From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Monday, May 26, 2014 12:23 AM To: user@spark.apache.org Subject: Re: GraphX partition

Re: PySpark Mesos random crashes

2014-05-26 Thread Perttu Ranta-aho
Thanks for the comments. Any ideas how to search for the problem? As even a smallish data set (something which I can run locally) fails in a cluster I don't expect this to be a memory problem, for the same reason data/code related problems don't feel reasonable. Could it be something with

maprfs and spark libraries

2014-05-26 Thread nelson
Hi all, I am trying to run spark over a MapR cluster. I successfully ran several custom applications on a previous non-mapr hadoop cluster but i can't get them working on the mapr one. To be more specific, i am not able to read or write on mfs without running into a serialization error from Java.

K-nearest neighbors search in Spark

2014-05-26 Thread Carter
Hi all,I want to implement a basic K-nearest neighbors search in Spark, but I am totally new to Scala so don't know where to start with.My data consists of millions of points. For each point, I need to compute its Euclidean distance to the other points, and return the top-K points that are closest

Sorting data large data- too many open files exception

2014-05-26 Thread Matt Kielo
Hello, I currently have a task always failing with java.io.FileNotFoundException: [...]/shuffle_0_257_2155 (Too many open files) when I run sorting operations such as distinct, sortByKey, or reduceByKey on a large number of partitions. Im working with 365 GB of data which is being split into

RE: GraphX partition problem

2014-05-26 Thread Zhicharevich, Alex
Can we do better with Bagel somehow? Control how we store the graph? From: Ankur Dave [mailto:ankurd...@gmail.com] Sent: Monday, May 26, 2014 12:13 PM To: user@spark.apache.org Subject: Re: GraphX partition problem GraphX only performs sequential scans over the edges, so we could in theory

Re: maprfs and spark libraries

2014-05-26 Thread Mayur Rustagi
Did you try in standalone mode. You may not see serialization issues in local threaded mode. Serialization errors are unlikely to be cause of Mapr hadoop version. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi

Context switch in spark

2014-05-26 Thread Andras Nemeth
Hi Spark Users, What are the restrictions for using more then one spark contexts in a single scala application? I did not see any documented limitations, but we did observe some bad behavior when trying to do this. The one I'm hitting now is that if I create a local context, stop it and then

Re: maprfs and spark libraries

2014-05-26 Thread Surendranauth Hiraman
My team is successfully running on Spark on MapR. However, we add the mapr jars to the SPARK_CLASSAPTH on the workers, as well as making sure they are on the classpath of the driver. I'm not sure if we need every jar that we currently add but below is what we currently use. The important file in

Re: maprfs and spark libraries

2014-05-26 Thread nelson
thanks for replying guys. Mayur: Indeed i tried the local mode (sparkmaster: local[5]) before and the application runs well, no serialization problem. The problem arises as soon as i try to run the app over the cluster. Surendranauth: I just double checked my spark_classpath from spark_env.sh

Re: maprfs and spark libraries

2014-05-26 Thread Surendranauth Hiraman
We use the mapr rpm and have successfully read and written hdfs data. Are you using custom readers/writers? Maybe the relevant stacktrace might help. Maybe also try a standard text reader and writer to see if there is a basic issue with accessing mfs? -Suren On Mon, May 26, 2014 at 11:31 AM,

Re: maprfs and spark libraries

2014-05-26 Thread Surendranauth Hiraman
When I have stack traces, I usually see the MapR versions of the various hadoop classes, though maybe that's at a deeper level of the stack trace. If my memory is right though, this may point to the classpath having regular hadoop jars before the standard hadoop jars. My guess is that this is on

Re: Running a spark-submit compatible app in spark-shell

2014-05-26 Thread Andrew Or
Hi Roger, This was due to a bug in the Spark shell code, and is fixed in the latest master (and RC11). Here is the commit that fixed it: https://github.com/apache/spark/commit/8edbee7d1b4afc192d97ba192a5526affc464205. Try it now and it should work. :) Andrew 2014-05-26 10:35 GMT+02:00 Perttu

Re: counting degrees graphx

2014-05-26 Thread daze5112
Excellent thanks Ankur, looks like what im looking for Only one problem the line val dists = initDists.pregel[DistanceMap](Map())(vprog, sendMsg, mergeMsg) produces an error Job aborted: Task 268.0:5 had a not serializable result: java.io.NotSerializableException:

Re: counting degrees graphx

2014-05-26 Thread Ankur Dave
Oh, looks like the Scala Map isn't serializable. I switched the code to use java.util.HashMap, which should work. Ankur http://www.ankurdave.com/ On Mon, May 26, 2014 at 3:21 PM, daze5112 david.zeel...@ato.gov.au wrote: Excellent thanks Ankur, looks like what im looking for Only one problem

Re: counting degrees graphx

2014-05-26 Thread Ankur Dave
On closer inspection it looks like Map normally is serializable, and it's just a bug in mapValues, so I changed to using the .map(identity) workaround described in https://issues.scala-lang.org/browse/SI-7005. Ankur http://www.ankurdave.com/

Re: maprfs and spark libraries

2014-05-26 Thread Cafe au Lait (icloud)
hi On May 26, 2014, at 06:48 PM, nelson nelson.verd...@ysance.com wrote: The test application is built using sbt with the following dependency: - org.apache.spark spark-core 0.9.1   you need to remove this dependency. sbt will pack non-mapr version of hadoop classes.