Re: Read Parquet file from scala directly

2015-03-10 Thread Akhil Das
Here's a Java version https://github.com/cloudera/parquet-examples/tree/master/MapReduce It won't be that hard to make that in Scala. Thanks Best Regards On Mon, Mar 9, 2015 at 9:55 PM, Shuai Zheng szheng.c...@gmail.com wrote: Hi All, I have a lot of parquet files, and I try to open them

Re: saveAsTextFile extremely slow near finish

2015-03-10 Thread Akhil Das
Don't you think 1000 is too less for 160GB of data? Also you could try using KryoSerializer, Enabling RDD Compression. Thanks Best Regards On Mon, Mar 9, 2015 at 11:01 PM, mingweili0x m...@spokeo.com wrote: I'm basically running a sorting using spark. The spark program will read from HDFS,

Re: Spark-on-YARN architecture

2015-03-10 Thread Harika Matha
Thanks for the quick reply. I am running the application in YARN client mode. And I want to run the AM on the same node as RM inorder use the node which otherwise would run AM. How can I get AM run on the same node as RM? On Tue, Mar 10, 2015 at 3:49 PM, Sean Owen so...@cloudera.com wrote:

Registering custom UDAFs with HiveConetxt in SparkSQL, how?

2015-03-10 Thread shahab
Hi, I need o develop couple of UDAFs and use them in the SparkSQL. While UDFs can be registered as a function in HiveContext, I could not find any documentation of how UDAFs can be registered in the HiveContext?? so far what I have found is to make a JAR file, out of developed UDAF class, and

Re: Spark-on-YARN architecture

2015-03-10 Thread Sean Owen
I suppose you just provision enough resource to run both on that node... but it really shouldn't matter. The RM and your AM aren't communicating heavily. On Tue, Mar 10, 2015 at 10:23 AM, Harika Matha matha.har...@gmail.com wrote: Thanks for the quick reply. I am running the application in

[SparkSQL] Reuse HiveContext to different Hive warehouse?

2015-03-10 Thread Haopu Wang
I'm using Spark 1.3.0 RC3 build with Hive support. In Spark Shell, I want to reuse the HiveContext instance to different warehouse locations. Below are the steps for my test (Assume I have loaded a file into table src). == 15/03/10 18:22:59 INFO SparkILoop: Created sql context (with

Spark-on-YARN architecture

2015-03-10 Thread Harika
Hi all, I have Spark cluster setup on YARN with 4 nodes(1 master and 3 slaves). When I run an application, YARN chooses, at random, one Application Master from among the slaves. This means that my final computation is being carried only on two slaves. This decreases the performance of the

Re: saveAsTextFile extremely slow near finish

2015-03-10 Thread Sean Owen
This is more of an aside, but why repartition this data instead of letting it define partitions naturally? You will end up with a similar number. On Mar 9, 2015 5:32 PM, mingweili0x m...@spokeo.com wrote: I'm basically running a sorting using spark. The spark program will read from HDFS, sort

Re: Spark-on-YARN architecture

2015-03-10 Thread Sean Owen
In YARN cluster mode, there is no Spark master, since YARN is your resource manager. Yes you could force your AM somehow to run on the same node as the RM, but why -- what do think is faster about that? On Tue, Mar 10, 2015 at 10:06 AM, Harika matha.har...@gmail.com wrote: Hi all, I have Spark

Re: error on training with logistic regression sgd

2015-03-10 Thread Peng Xia
Hi, Can anyone give an idea about this? Just did some google search, it seems related to the 2gb limitation on block size, https://issues.apache.org/jira/browse/SPARK-1476. The whole process is that: 1. load the data 2. convert each line of data into labeled points using some feature hashing

RE: sc.textFile() on windows cannot access UNC path

2015-03-10 Thread java8964
I think the work around is clear. Using JDK 7, and implement your own saveAsRemoteWinText() using java.nio.path. Yong From: ningjun.w...@lexisnexis.com To: java8...@hotmail.com; user@spark.apache.org Subject: RE: sc.textFile() on windows cannot access UNC path Date: Tue, 10 Mar 2015 03:02:37

RE: Does any one know how to deploy a custom UDAF jar file in SparkSQL?

2015-03-10 Thread Cheng, Hao
You can add the additional jar when submitting your job, something like: ./bin/spark-submit --jars xx.jar … More options can be listed by just typing ./bin/spark-submit From: shahab [mailto:shahab.mok...@gmail.com] Sent: Tuesday, March 10, 2015 8:48 PM To: user@spark.apache.org Subject: Does

RE: Registering custom UDAFs with HiveConetxt in SparkSQL, how?

2015-03-10 Thread Cheng, Hao
Currently, Spark SQL doesn’t provide interface for developing the custom UDTF, but it can work seamless with Hive UDTF. I am working on the UDTF refactoring for Spark SQL, hopefully will provide an Hive independent UDTF soon after that. From: shahab [mailto:shahab.mok...@gmail.com] Sent:

Does any one know how to deploy a custom UDAF jar file in SparkSQL?

2015-03-10 Thread shahab
Hi, Does any one know how to deploy a custom UDAF jar file in SparkSQL? Where should i put the jar file so SparkSQL can pick it up and make it accessible for SparkSQL applications? I do not use spark-shell instead I want to use it in an spark application. best, /Shahab

Re: FW: RE: distribution of receivers in spark streaming

2015-03-10 Thread Du Li
Thanks TD and Jerry for suggestions. I have done some experiments and worked out a reasonable solution to the problem of spreading receivers to a set of worker hosts. It would be a bit too tedious to document in email. So I discuss the solution in a blog: 

Re: SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-10 Thread Tobias Pfeiffer
Hi, On Tue, Mar 10, 2015 at 2:13 PM, Cesar Flores ces...@gmail.com wrote: I am new to the SchemaRDD class, and I am trying to decide in using SQL queries or Language Integrated Queries ( https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD ). Can someone

Re: Why spark master consumes 100% CPU when we kill a spark streaming app?

2015-03-10 Thread Saisai Shao
Probably the cleanup work like clean shuffle files, tmp files cost too much of CPUs, since if we run Spark Streaming for a long time, lots of files will be generated, so cleanup this files before app is exited could be time-consuming. Thanks Jerry 2015-03-11 10:43 GMT+08:00 Tathagata Das

Re: Solve least square problem of the form min norm(A x - b)^2^ + lambda * n * norm(x)^2 ?

2015-03-10 Thread Jaonary Rabarisoa
I'm trying to play with the implementation of least square solver (Ax = b) in mlmatrix.TSQR where A is a 5*1024 matrix and b a 5*10 matrix. It works but I notice that it's 8 times slower than the implementation given in the latest ampcamp :

Re: Setting up Spark with YARN on EC2 cluster

2015-03-10 Thread roni
Hi Harika, Did you get any solution for this? I want to use yarn , but the spark-ec2 script does not support it. Thanks -Roni -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Setting-up-Spark-with-YARN-on-EC2-cluster-tp21818p21991.html Sent from the Apache

Re: ANSI Standard Supported by the Spark-SQL

2015-03-10 Thread Michael Armbrust
Spark SQL supports a subset of HiveQL: http://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive On Mon, Mar 9, 2015 at 11:32 PM, Ravindra ravindra.baj...@gmail.com wrote: From the archives in this user list, It seems that Spark-SQL is yet to achieve SQL 92

Re: Compilation error

2015-03-10 Thread Tathagata Das
If you are using tools like SBT/Maven/Gradle/etc, they figure out all the recursive dependencies and includes them in the class path. I haven't touched Eclipse in years so I am not sure off the top of my head what's going on instead. Just in case you only downloaded the spark-streaming_2.10.jar

Re: Registering custom UDAFs with HiveConetxt in SparkSQL, how?

2015-03-10 Thread shahab
Thanks Hao, But my question concerns UDAF (user defined aggregation function ) not UDTF( user defined type function ). I appreciate if you could point me to some starting point on UDAF development in Spark. Thanks Shahab On Tuesday, March 10, 2015, Cheng, Hao hao.ch...@intel.com wrote:

Re: Compilation error

2015-03-10 Thread Tathagata Das
You have to include Scala libraries in the Eclipse dependencies. TD On Tue, Mar 10, 2015 at 10:54 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am trying out streaming example as documented and I am using spark 1.2.1 streaming from maven for Java. When I add this code I get compilation

Re: Can't cache RDD of collaborative filtering on MLlib

2015-03-10 Thread Yuichiro Sakamoto
Thank you for your reply. 1. Which version of Spark do you use now? I use Spark 1.2.0. (CDH 5.3.1) 2. Why don't you check whether `productJavaRDD ` and `userJavaRDD ` are cached with Web UI or not? I checked SparkUI. Task was stopped at 1/2 (Succeeded/Total tasks). Here is

Re: Compilation error

2015-03-10 Thread Mohit Anchlia
How do I do that? I haven't used Scala before. Also, linking page doesn't mention that: http://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#linking On Tue, Mar 10, 2015 at 10:57 AM, Sean Owen so...@cloudera.com wrote: It means you do not have Scala library classes in your

Re: Why spark master consumes 100% CPU when we kill a spark streaming app?

2015-03-10 Thread Tathagata Das
Do you have event logging enabled? That could be the problem. The Master tries to aggressively recreate the web ui of the completed job with the event logs (when it is enabled) causing the Master to stall. I created a JIRA for this. https://issues.apache.org/jira/browse/SPARK-6270 On Tue, Mar 10,

Re: Is it possible to use windows service to start and stop spark standalone cluster

2015-03-10 Thread Silvio Fiorito
Have you tried Apache Daemon? http://commons.apache.org/proper/commons-daemon/procrun.html From: Wang, Ningjun (LNG-NPV) Date: Tuesday, March 10, 2015 at 11:47 PM To: user@spark.apache.orgmailto:user@spark.apache.org Subject: Is it possible to use windows service to start and stop spark

S3 SubFolder Write Issues

2015-03-10 Thread cpalm3
Hi All, I am hoping someone has seen this issue before with S3, as I haven't been able to find a solution for this problem. When I try to save as Text file to s3 into a subfolder, it only ever writes out to the bucket level folder and produces block level generated file names and not my output

Re: SQL with Spark Streaming

2015-03-10 Thread Tobias Pfeiffer
Hi, On Wed, Mar 11, 2015 at 9:33 AM, Cheng, Hao hao.ch...@intel.com wrote: Intel has a prototype for doing this, SaiSai and Jason are the authors. Probably you can ask them for some materials. The github repository is here: https://github.com/intel-spark/stream-sql Also, what I did is

Re: ANSI Standard Supported by the Spark-SQL

2015-03-10 Thread Ravindra
Thanks Michael, That helps. So just to summarise that we should not make any assumption about Spark being fully compliant with any SQL Standards until announced by the community and maintain the same status quo as you have suggested. Regards, Ravi. On Tue, Mar 10, 2015 at 11:14 PM Michael

Is it possible to use windows service to start and stop spark standalone cluster

2015-03-10 Thread Wang, Ningjun (LNG-NPV)
We are using spark stand alone cluster on Windows 2008 R2. I can start spark clusters by open an command prompt and run the following bin\spark-class.cmd org.apache.spark.deploy.master.Master bin\spark-class.cmd org.apache.spark.deploy.worker.Worker spark://mywin.mydomain.com:7077 I can stop

Re: sparse vector operations in Python

2015-03-10 Thread Joseph Bradley
There isn't a great way currently. The best option is probably to convert to scipy.sparse column vectors and add using scipy. Joseph On Mon, Mar 9, 2015 at 4:21 PM, Daniel, Ronald (ELS-SDG) r.dan...@elsevier.com wrote: Hi, Sorry to ask this, but how do I compute the sum of 2 (or more) mllib

Why spark master consumes 100% CPU when we kill a spark streaming app?

2015-03-10 Thread Xuelin Cao
Hey, Recently, we found in our cluster, that when we kill a spark streaming app, the whole cluster cannot response for 10 minutes. And, we investigate the master node, and found the master process consumes 100% CPU when we kill the spark streaming app. How could it happen? Did

Pyspark not using all cores

2015-03-10 Thread htailor
Hi All, I need some help with a problem in pyspark which is causing a major issue. Recently I've noticed that the behaviour of the python.deamons on the worker nodes for compute-intensive tasks have changed from using all the avaliable cores to using only a single core. On each worker node, 8

Re: Spark Streaming testing strategies

2015-03-10 Thread Marcin Kuthan
Hi Holden Thanks Holden for pointing me the package. Indeed StreamingSuiteBase trait hides a lot, especially regarding clock manipulation. Did you encounter problems with concurrent tests execution from SBT (SPARK-2243)? I had to disable parallel execution and configure SBT to use separate JVM

Compilation error on JavaPairDStream

2015-03-10 Thread Mohit Anchlia
I am getting following error. When I look at the sources it seems to be a scala source, but not sure why it's complaining about it. The method map(FunctionString,R) in the type JavaDStreamString is not applicable for the arguments (new PairFunctionString,String,Integer(){}) And my code has

java.io.InvalidClassException: org.apache.spark.rdd.PairRDDFunctions; local class incompatible: stream classdesc

2015-03-10 Thread Manas Kar
Hi, I have a CDH5.3.2(Spark1.2) cluster. I am getting an local class incompatible exception for my spark application during an action. All my classes are case classes(To best of my knowledge) Appreciate any help. Exception in thread main org.apache.spark.SparkException: Job aborted due to

Writing wide parquet file in Spark SQL

2015-03-10 Thread kpeng1
Hi All, I am currently trying to write a very wide file into parquet using spark sql. I have 100K column records that I am trying to write out, but of course I am running into space issues(out of memory - heap space). I was wondering if there are any tweaks or work arounds for this. I am

Re: Compilation error

2015-03-10 Thread Mohit Anchlia
I navigated to maven dependency and found scala library. I also found Tuple2.class and when I click on it in eclipse I get invalid LOC header (bad signature) java.util.zip.ZipException: invalid LOC header (bad signature) at java.util.zip.ZipFile.read(Native Method) I am wondering if I should

Re: Spark Streaming testing strategies

2015-03-10 Thread Holden Karau
On Tue, Mar 10, 2015 at 1:18 PM, Marcin Kuthan marcin.kut...@gmail.com wrote: Hi Holden Thanks Holden for pointing me the package. Indeed StreamingSuiteBase trait hides a lot, especially regarding clock manipulation. Did you encounter problems with concurrent tests execution from SBT

Re: Compilation error on JavaPairDStream

2015-03-10 Thread Sean Owen
Ah, that's a typo in the example: use words.mapToPair I can make a little PR to fix that. On Tue, Mar 10, 2015 at 8:32 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am getting following error. When I look at the sources it seems to be a scala source, but not sure why it's complaining about

Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Nitay Joffe
In Spark 1.2 I used to be able to do this: scala org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType(structint:bigint) res30: org.apache.spark.sql.catalyst.types.DataType = StructType(List(StructField(int,LongType,true))) That is, the name of a column can be a keyword like int. This is no

SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-10 Thread Cesar Flores
I am new to the SchemaRDD class, and I am trying to decide in using SQL queries or Language Integrated Queries ( https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD ). Can someone tell me what is the main difference between the two approaches, besides using

RE: Compilation error

2015-03-10 Thread java8964
Or another option is to use Scala-IDE, which is built on top of Eclipse, instead of pure Eclipse, so Scala comes with it. Yong From: so...@cloudera.com Date: Tue, 10 Mar 2015 18:40:44 + Subject: Re: Compilation error To: mohitanch...@gmail.com CC: t...@databricks.com;

Re: Compilation error

2015-03-10 Thread Mohit Anchlia
I am using maven and my dependency looks like this, but this doesn't seem to be working dependencies dependency groupIdorg.apache.spark/groupId artifactIdspark-streaming_2.10/artifactId version1.2.0/version /dependency dependency groupIdorg.apache.spark/groupId

Re: Compilation error

2015-03-10 Thread Tathagata Das
See if you can import scala libraries in your project. On Tue, Mar 10, 2015 at 11:32 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am using maven and my dependency looks like this, but this doesn't seem to be working dependencies dependency groupIdorg.apache.spark/groupId

ec2 persistent-hdfs with ebs using spot instances

2015-03-10 Thread Deborah Siegel
Hello, I'm new to ec2. I've set up a spark cluster on ec2 and am using persistent-hdfs with the data nodes mounting ebs. I launched my cluster using spot-instances ./spark-ec2 -k mykeypair -i ~/aws/mykeypair.pem -t m3.xlarge -s 4 -z us-east-1c --spark-version=1.2.0 --spot-price=.0321

How to pass parameter to spark-shell when choose client mode --master yarn-client

2015-03-10 Thread Shuai Zheng
Hi All, I try to pass parameter to the spark-shell when I do some test: spark-shell --driver-memory 512M --executor-memory 4G --master spark://:7077 --conf spark.sql.parquet.compression.codec=snappy --conf spark.sql.parquet.binaryAsString=true This works fine on my local pc. And

Re: Setting up Spark with YARN on EC2 cluster

2015-03-10 Thread Deborah Siegel
Harika, I think you can modify existing spark on ec2 cluster to run Yarn mapreduce, not sure if this is what you are looking for. To try: 1) logon to master 2) go into either ephemeral-hdfs/conf/ or persistent-hdfs/conf/ and add this to mapred-site.xml : property

Re: Compilation error

2015-03-10 Thread Mohit Anchlia
I ran the dependency command and see the following dependencies: I only see org.scala-lang. [INFO] org.spark.test:spak-test:jar:0.0.1-SNAPSHOT [INFO] +- org.apache.spark:spark-streaming_2.10:jar:1.2.0:compile [INFO] | +- org.eclipse.jetty:jetty-server:jar:8.1.14.v20131031:compile [INFO] | |

Re: ANSI Standard Supported by the Spark-SQL

2015-03-10 Thread Ravindra
From the archives in this user list, It seems that Spark-SQL is yet to achieve SQL 92 level. But there are few things still not clear. 1. This is from an old post dated : Aug 09, 2014. 2. It clearly says that it doesn't support DDL and DML operations. Does that means, all reads (select) are sql

Re: Spark with Spring

2015-03-10 Thread Akhil Das
It will be good if you can explain the entire usecase like what kind of requests, what sort of processing etc. Thanks Best Regards On Mon, Mar 9, 2015 at 11:18 PM, Tarun Garg bigdat...@live.com wrote: Hi, I have a existing web base system which receives the request and process that. This

Re: Joining data using Latitude, Longitude

2015-03-10 Thread Akhil Das
Are you using SparkSQL for the join? In that case I'm not quiet sure you have a lot of options to join on the nearest co-ordinate. If you are using the normal Spark code (by creating key-pair on lat,lon) you can apply certain logic like trimming the lat,lon etc. If you want more specific computing

Re: Spark History server default conf values

2015-03-10 Thread Charles Feduke
What I found from a quick search of the Spark source code (from my local snapshot on January 25, 2015): // Interval between each check for event log updates private val UPDATE_INTERVAL_MS = conf.getInt(spark.history.fs.updateInterval, conf.getInt(spark.history.updateInterval, 10)) * 1000

Re: Spark Streaming testing strategies

2015-03-10 Thread Marcin Kuthan
I would expect base trait for testing purposes in spark distribution. ManualClock should be exposed as well. And some documentation how to configure SBT to avoid problems with multiple spark contexts. I'm going to create improvement proposal on Spark issue tracker about it. Right now I

Re: Spark History server default conf values

2015-03-10 Thread Srini Karri
Thank you Charles and Meethu. On Tue, Mar 10, 2015 at 12:47 AM, Charles Feduke charles.fed...@gmail.com wrote: What I found from a quick search of the Spark source code (from my local snapshot on January 25, 2015): // Interval between each check for event log updates private val

Hadoop Map vs Spark stream Map

2015-03-10 Thread Mohit Anchlia
Hi, I am trying to understand Hadoop Map method compared to spark Map and I noticed that spark Map only receives 3 arguments 1) input value 2) output key 3) output value, however in hadoop map it has 4 values 1) input key 2) input value 3) output key 4) output value. Is there any reason it was

Re: Compilation error on JavaPairDStream

2015-03-10 Thread Mohit Anchlia
works now. I should have checked :) On Tue, Mar 10, 2015 at 1:44 PM, Sean Owen so...@cloudera.com wrote: Ah, that's a typo in the example: use words.mapToPair I can make a little PR to fix that. On Tue, Mar 10, 2015 at 8:32 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I am getting

Re: Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Michael Armbrust
Thanks for reporting. This was a result of a change to our DDL parser that resulted in types becoming reserved words. I've filled a JIRA and will investigate if this is something we can fix. https://issues.apache.org/jira/browse/SPARK-6250 On Tue, Mar 10, 2015 at 1:51 PM, Nitay Joffe

Re: SchemaRDD: SQL Queries vs Language Integrated Queries

2015-03-10 Thread Reynold Xin
They should have the same performance, as they are compiled down to the same execution plan. Note that starting in Spark 1.3, SchemaRDD is renamed DataFrame: https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html On Tue, Mar 10, 2015 at 2:13

Re: Joining data using Latitude, Longitude

2015-03-10 Thread John Meehan
There are some techniques you can use If you geohash http://en.wikipedia.org/wiki/Geohash the lat-lngs. They will naturally be sorted by proximity (with some edge cases so watch out). If you go the join route, either by trimming the lat-lngs or geohashing them, you’re essentially grouping

RE: Spark SQL Stackoverflow error

2015-03-10 Thread jishnu.prathap
import com.google.gson.{GsonBuilder, JsonParser} import org.apache.spark.mllib.clustering.KMeans import org.apache.spark.sql.SQLContext import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.mllib.clustering.KMeans /** * Examine the collected tweets and trains a model based on

Re: Workaround for spark 1.2.X roaringbitmap kryo problem?

2015-03-10 Thread Arun Luthra
Does anyone know how to get the HighlyCompressedMapStatus to compile? I will try turning off kryo in 1.2.0 and hope things don't break. I want to benefit from the MapOutputTracker fix in 1.2.0. On Tue, Mar 3, 2015 at 5:41 AM, Imran Rashid iras...@cloudera.com wrote: the scala syntax for

Re: Spark 1.3 SQL Type Parser Changes?

2015-03-10 Thread Yin Huai
Hi Nitay, Can you try using backticks to quote the column name? Like org.apache.spark.sql.hive.HiveMetastoreTypes.toDataType( struct`int`:bigint)? Thanks, Yin On Tue, Mar 10, 2015 at 2:43 PM, Michael Armbrust mich...@databricks.com wrote: Thanks for reporting. This was a result of a change

SQL with Spark Streaming

2015-03-10 Thread Mohit Anchlia
Does Spark Streaming also supports SQLs? Something like how Esper does CEP.

RE: Registering custom UDAFs with HiveConetxt in SparkSQL, how?

2015-03-10 Thread Cheng, Hao
Oh, sorry, my bad, currently Spark SQL doesn’t provide the user interface for UDAF, but it can work seamlessly with Hive UDAF (via HiveContext). I am also working on the UDAF interface refactoring, after that we can provide the custom interface for extension.

RE: [SparkSQL] Reuse HiveContext to different Hive warehouse?

2015-03-10 Thread Cheng, Hao
I am not so sure if Hive supports change the metastore after initialized, I guess not. Spark SQL totally rely on Hive Metastore in HiveContext, probably that's why it doesn't work as expected for Q1. BTW, in most of cases, people configure the metastore settings in hive-site.xml, and will not

Numbering RDD members Sequentially

2015-03-10 Thread Steve Lewis
I have Hadoop Input Format which reads records and produces JavaPairRDDString,String locatedData where _1() is a formatted version of the file location - like 12690,, 24386 .27523 ... _2() is data to be processed For historical reasons I want to convert _1() into in integer

RE: SQL with Spark Streaming

2015-03-10 Thread Cheng, Hao
Intel has a prototype for doing this, SaiSai and Jason are the authors. Probably you can ask them for some materials. From: Mohit Anchlia [mailto:mohitanch...@gmail.com] Sent: Wednesday, March 11, 2015 8:12 AM To: user@spark.apache.org Subject: SQL with Spark Streaming Does Spark Streaming also