Re: KMeans Input Format

2014-08-09 Thread AlexanderRiggers
Thank you for your help. After restructuring my code to Seans input, it worked without changing Spark context. I now took the same file format just a bigger file(2.7GB) from s3 to my cluster with 4 c3.xlarge instances and Spark 1.0.2. Unluckly my task freezes again after a short time. I tried it

Re: KMeans Input Format

2014-08-08 Thread AlexanderRiggers
Thanks for your answers. I added some lines to my code and it went through, but I get a error message for my compute cost function now... scala val WSSSE = model.computeCost(train)14/08/08 15:48:42 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(driver, 192.168.0.33, 49242, 0)

Re: KMeans Input Format

2014-08-07 Thread AlexanderRiggers
Thanks for your answers. The dataset is only 400MB, so I shouldn't run out of memory. I restructured my code now, because I forgot to cache my dataset and set down number of iterations to 2, but still get kicked out of Spark. Did I cache the data wrong (sorry not an expert): scala import

GraphX Pagerank application

2014-08-06 Thread AlexanderRiggers
I want to use pagerank on a 3GB textfile, which contains a bipartite list with variables id and brand. Example: id,brand 86246,15343 86246,27873 86246,14647 86246,55172 86246,3293 86246,2820 86246,3830 86246,2820 86246,5603 86246,72482 To perform the page rank I have to create a graph object,

Re: Terminal freeze during SVM

2014-07-16 Thread AlexanderRiggers
so I need to reconfigure my sparkcontext this way: val conf = new SparkConf() .setMaster(local) .setAppName(CountingSheep) .set(spark.executor.memory, 1g) .set(spark.akka.frameSize,20) val sc = new SparkContext(conf) And start a new cluster

Re: Terminal freeze during SVM

2014-07-10 Thread AlexanderRiggers
Tried the newest branch, but still get stuck on the same task: (kill) runJob at SlidingRDD.scala:74 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Terminal-freeze-during-SVM-Broken-pipe-tp9022p9304.html Sent from the Apache Spark User List mailing list

Re: Terminal freeze during SVM

2014-07-09 Thread AlexanderRiggers
By latest branch you mean Apache Spark 1.0.0 ? and what do you mean by master? Because I am using v 1.0.0 - Alex -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Terminal-freeze-during-SVM-Broken-pipe-tp9022p9208.html Sent from the Apache Spark User List

Sample datasets for MLlib and Graphx

2014-07-03 Thread AlexanderRiggers
Hello! I want to play around with several different cluster settings and measure performances for MLlib and GraphX and was wondering if anybody here could hit me up with datasets for these applications from 5GB onwards? I mostly interested in SVM and Triangle Count, but would be glad for any

Re: Sample datasets for MLlib and Graphx

2014-07-03 Thread AlexanderRiggers
Nick Pentreath wrote Take a look at Kaggle competition datasets - https://www.kaggle.com/competitions I was looking for files in LIBSVM format and never found something on Kaggle in bigger size. Most competitions I ve seen need data processing and feature generating, but maybe I ve to take a