Re: Convert DStream[Long] to Long

2015-04-25 Thread Akhil Das
Like this? messages.foreachRDD(rdd = { if(rdd.count() 0) //Do whatever you want. }) Thanks Best Regards On Fri, Apr 24, 2015 at 11:20 PM, Sergio Jiménez Barrio drarse.a...@gmail.com wrote: Hi, I need compare the count of messages recived if is 0 or not, but messages.count() return a

Re: Convert DStream[Long] to Long

2015-04-25 Thread Sergio Jiménez Barrio
It is solved. Thank u! Is more efficient messages.foreachRDD(rdd = { if(!rdd.isEmpty) //Do whatever you want. }) 2015-04-25 19:21 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com: Like this? messages.foreachRDD(rdd = { if(rdd.count() 0) //Do whatever you want. }) Thanks Best

Re: StreamingContext.textFileStream issue

2015-04-25 Thread Yang Lei
I have no problem running the socket text stream sample in the same environment. Thanks Yang Sent from my iPhone On Apr 25, 2015, at 1:30 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Make sure you are having =2 core for your streaming application. Thanks Best Regards On Sat,

Re: DAG

2015-04-25 Thread Akhil Das
May be this will give you a good start https://github.com/apache/spark/pull/2077 Thanks Best Regards On Sat, Apr 25, 2015 at 1:29 AM, Giovanni Paolo Gibilisco gibb...@gmail.com wrote: Hi, I would like to know if it is possible to build the DAG before actually executing the application. My

Re: directory loader in windows

2015-04-25 Thread ayan guha
This code is in python. Also I tried with fwd slash at the end with same result On 26 Apr 2015 01:36, Jeetendra Gangele gangele...@gmail.com wrote: also if this code is in scala why not val in newsY? is this define above? loc = D:\\Project\\Spark\\code\\news\\jsonfeeds newsY =

Re: StreamingContext.textFileStream issue

2015-04-25 Thread Akhil Das
Make sure you are having =2 core for your streaming application. Thanks Best Regards On Sat, Apr 25, 2015 at 3:02 AM, Yang Lei genia...@gmail.com wrote: I hit the same issue as if the directory has no files at all when running the sample examples/src/main/python/streaming/hdfs_wordcount.py

回复:Re: Spark SQL 1.3.1: java.lang.ClassCastException is thrown

2015-04-25 Thread doovsaid
Even through grouping by only on name, the issue (CassCastException) still be here. - 原始邮件 发件人:ayan guha guha.a...@gmail.com 收件人:doovs...@sina.com 抄送人:user user@spark.apache.org 主题:Re: Spark SQL 1.3.1: java.lang.ClassCastException is thrown 日期:2015年04月25日 22点33分 Sorry if I am looking

Re: spark1.3.1 using mysql error!

2015-04-25 Thread Anand Mohan
Yes, you would need to add the MySQL driver jar to the Spark driver executor classpath. Either using the deprecated SPARK_CLASSPATH environment variable (which the latest docs still recommend anyway although its deprecated) like so export SPARK_CLASSPATH=/usr/share/java/mysql-connector.jar

Re: DAG

2015-04-25 Thread Corey Nolet
Giovanni, The DAG can be walked by calling the dependencies() function on any RDD. It returns a Seq containing the parent RDDs. If you start at the leaves and walk through the parents until dependencies() returns an empty Seq, you ultimately have your DAG. On Sat, Apr 25, 2015 at 1:28 PM, Akhil

Re: How can I retrieve item-pair after calculating similarity using RowMatrix

2015-04-25 Thread Joseph Bradley
It looks like your code is making 1 Row per item, which means that columnSimilarities will compute similarities between users. If you transpose the matrix (or construct it as the transpose), then columnSimilarities should do what you want, and it will return meaningful indices. Joseph On Fri,

Re: KMeans takeSample jobs and RDD cached

2015-04-25 Thread Joseph Bradley
Yes, the count() should be the first task, and the sampling + collecting should be the second task. The first one is probably slow because the RDD being sampled is not yet cached/materialized. K-Means creates some RDDs internally while learning, and since they aren't needed after learning, they

How can I retrieve item-pair after calculating similarity using RowMatrix

2015-04-25 Thread amghost
I have encountered the all-pairs similarity problem in my recommendation system. Thanks to this databricks blog, it seems RowMatrix may come to help. However, RowMatrix is a matrix type without meaningful row indices, thereby I don't know how to retrieve the similarity result after invoking

Re: what is the best way to transfer data from RDBMS to spark?

2015-04-25 Thread ayan guha
Actually, Spark SQL provides a data source. Here is from documentation - JDBC To Other Databases Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD

回复:Re: Spark SQL 1.3.1: java.lang.ClassCastException is thrown

2015-04-25 Thread doovsaid
Yeah, same issue. I noticed this issue is not solved yet. - 原始邮件 - 发件人:Ted Yu yuzhih...@gmail.com 收件人:doovs...@sina.com 抄送人:user user@spark.apache.org 主题:Re: Spark SQL 1.3.1: java.lang.ClassCastException is thrown 日期:2015年04月25日 22点04分 Looks like this is related:

Re: Spark SQL 1.3.1: java.lang.ClassCastException is thrown

2015-04-25 Thread ayan guha
Sorry if I am looking at the wrong issue, but your query is wrong.you shoulf group by only on name. On Sat, Apr 25, 2015 at 11:59 PM, doovs...@sina.com wrote: Hi all, When I query Postgresql based on Spark SQL like this: dataFrame.registerTempTable(Employees) val emps =

directory loader in windows

2015-04-25 Thread ayan guha
Hi I am facing this weird issue. I am on Windows, and I am trying to load all files within a folder. Here is my code - loc = D:\\Project\\Spark\\code\\news\\jsonfeeds newsY = sc.textFile(loc) print newsY.count() Even this simple code fails. I have tried with giving exact file names,

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
loc = D:\\Project\\Spark\\code\\news\\jsonfeeds\\ On 25 April 2015 at 20:49, Jeetendra Gangele gangele...@gmail.com wrote: Hi Ayan can you try below line loc = D:\\Project\\Spark\\code\\news\\jsonfeeds On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote: Hi I am facing this

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
Hi Ayan can you try below line loc = D:\\Project\\Spark\\code\\news\\jsonfeeds On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote: Hi I am facing this weird issue. I am on Windows, and I am trying to load all files within a folder. Here is my code - loc =

KMeans takeSample jobs and RDD cached

2015-04-25 Thread podioss
Hi, i am running k-means algorithm with initialization mode set to random and various dataset sizes and values for clusters and i have a question regarding the takeSample job of the algorithm. More specific i notice that in every application there are two sampling jobs. The first one is consuming

Spark SQL 1.3.1: java.lang.ClassCastException is thrown

2015-04-25 Thread doovsaid
Hi all, When I query Postgresql based on Spark SQL like this: dataFrame.registerTempTable(Employees) val emps = sqlContext.sql(select name, sum(salary) from Employees group by name, salary) monitor { emps.take(10) .map(row = (row.getString(0),

Re: Spark SQL 1.3.1: java.lang.ClassCastException is thrown

2015-04-25 Thread Ted Yu
Looks like this is related: https://issues.apache.org/jira/browse/SPARK-5456 On Sat, Apr 25, 2015 at 6:59 AM, doovs...@sina.com wrote: Hi all, When I query Postgresql based on Spark SQL like this: dataFrame.registerTempTable(Employees) val emps = sqlContext.sql(select name,

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
extra forward slash at the end. sometime I have seen this kind of issues On 25 April 2015 at 20:50, Jeetendra Gangele gangele...@gmail.com wrote: loc = D:\\Project\\Spark\\code\\news\\jsonfeeds\\ On 25 April 2015 at 20:49, Jeetendra Gangele gangele...@gmail.com wrote: Hi Ayan can you try

Re: directory loader in windows

2015-04-25 Thread Jeetendra Gangele
also if this code is in scala why not val in newsY? is this define above? loc = D:\\Project\\Spark\\code\\news\\jsonfeeds newsY = sc.textFile(loc) print newsY.count() On 25 April 2015 at 20:08, ayan guha guha.a...@gmail.com wrote: Hi I am facing this weird issue. I am on Windows, and I

Re: what is the best way to transfer data from RDBMS to spark?

2015-04-25 Thread Sujeevan
If your use case is more to do with querying RDBMS and then bringing the results to spark do some analysis then Spark SQL JDBC datasource API http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ is the best. If your use case is to bring entire data to