Extracting k-means cluster values along with centers?

2015-06-12 Thread Minnow Noir
Greetings. I have been following some of the tutorials online for Spark k-means clustering. I would like to be able to just dump all the cluster values and their centroids to text file so I can explore the data. I have the clusters as such: val clusters = KMeans.train(parsedData, numClusters,

Format RDD/SchemaRDD contents to screen?

2015-05-29 Thread Minnow Noir
Im trying to debug query results inside spark-shell, but finding it cumbersome to save to file and then use file system utils to explore the results, and .foreach(print) tends to interleave the results among the myriad log messages. Take() and collect() truncate. Is there a simple way to present

Query REST web service with Spark?

2015-03-31 Thread Minnow Noir
We have have some data on Hadoop that needs augmented with data only available to us via a REST service. We're using Spark to search for, and correct, missing data. Even though there are a lot of records to scour for missing data, the total number of calls to the service is expected to be low, so

Arguments/parameters in Spark shell scripts?

2015-03-29 Thread Minnow Noir
How does one consume parameters passed to a Scala script via spark-shell -i? 1. If I use an object with a main() method, the println outputs nothing as if not called: import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object Test {

Convert Spark SQL table to RDD in Scala / error: value toFloat is a not a member of Any

2015-03-22 Thread Minnow Noir
I'm following some online tutorial written in Python and trying to convert a Spark SQL table object to an RDD in Scala. The Spark SQL just loads a simple table from a CSV file. The tutorial says to convert the table to an RDD. The Python is products_rdd = sqlContext.table(products).map(lambda