subscribe

2014-09-11 Thread Erik van oosten
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Multi-tenancy for Spark (Streaming) Applications

2014-09-11 Thread Tobias Pfeiffer
Hi, by now I understood maybe a bit better how spark-submit and YARN play together and how Spark driver and slaves play together on YARN. Now for my usecase, as described on https://spark.apache.org/docs/latest/submitting-applications.html, I would probably have a end-user-facing gateway that

Re: can fileStream() or textFileStream() remember state?

2014-09-11 Thread vasiliy
When you get a stream from sc.fileStream() spark will process only files with file timestamp then current timestamp so all data from HDFS should not be processed again. You may have a another problem - spark will not process files that moved to your HDFS folder between your application restarts.

SchemaRDD saveToCassandra

2014-09-11 Thread lmk
Hi, My requirement is to extract certain fields from json files, run queries on them and save the result to cassandra. I was able to parse json , filter the result and save the rdd(regular) to cassandra. Now, when I try to read the json file through sqlContext , execute some queries on the same

Spark not installed + no access to web UI

2014-09-11 Thread mrm
Hi, I have been launching Spark in the same ways for the past months, but I have only recently started to have problems with it. I launch Spark using spark-ec2 script, but then I cannot access the web UI when I type address:8080 into the browser (it doesn't work with lynx either from the master

Unpersist

2014-09-11 Thread Deep Pradhan
I want to create a temporary variables in a spark code. Can I do this? for (i - num) { val temp = .. { do something } temp.unpersist() } Thank You

Re: Spark not installed + no access to web UI

2014-09-11 Thread Akhil Das
Which version of spark are you having? Thanks Best Regards On Thu, Sep 11, 2014 at 3:10 PM, mrm ma...@skimlinks.com wrote: Hi, I have been launching Spark in the same ways for the past months, but I have only recently started to have problems with it. I launch Spark using spark-ec2

Re: Unpersist

2014-09-11 Thread Akhil Das
like this? var temp = ... for (i - num) { temp = .. { do something } temp.unpersist() } Thanks Best Regards On Thu, Sep 11, 2014 at 3:26 PM, Deep Pradhan pradhandeep1...@gmail.com wrote: I want to create a temporary variables in a spark code. Can I do this? for (i - num) {

Re: Spark not installed + no access to web UI

2014-09-11 Thread mrm
I tried 1.0.0, 1.0.1 and 1.0.2. I also tried the latest github commit. After several hours trying to launch it, now it seems to be working, this is what I did (not sure if any of these steps helped): 1/ clone the spark repo into the master node 2/ run sbt/sbt assembly 3/ copy spark and spark-ec2

JMXSink for YARN deployment

2014-09-11 Thread Vladimir Tretyakov
Hello, we are in Sematext (https://apps.sematext.com/) are writing Monitoring tool for Spark and we came across one question: How to enable JMX metrics for YARN deployment? We put *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink to file $SPARK_HOME/conf/metrics.properties but it doesn't

Re: How to scale more consumer to Kafka stream

2014-09-11 Thread richiesgr
Thanks for all I'm going to check both solution -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-scale-more-consumer-to-Kafka-stream-tp13883p13959.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark streaming stops computing while the receiver keeps running without any errors reported

2014-09-11 Thread Aniket Bhatnagar
Hi all I am trying to run kinesis spark streaming application on a standalone spark cluster. The job works find in local mode but when I submit it (using spark-submit), it doesn't do anything. I enabled logs for org.apache.spark.streaming.kinesis package and I regularly get the following in

problem in using Spark-Cassandra connector

2014-09-11 Thread Karunya Padala
Hi, I am new to spark. I encountered an issue when trying to connect to Cassandra using Spark Cassandra connector. Can anyone help me. Following are the details. 1) Following Spark and Cassandra versions I am using on LUbuntu12.0. i)spark-1.0.2-bin-hadoop2 ii) apache-cassandra-2.0.10 2) In

Re: problem in using Spark-Cassandra connector

2014-09-11 Thread Reddy Raja
You will have to create create KeySpace and Table. See the message, Table not found: EmailKeySpace.Emails Looks like you have not created the Emails table. On Thu, Sep 11, 2014 at 6:04 PM, Karunya Padala karunya.pad...@infotech-enterprises.com wrote: Hi, I am new to spark. I

RE: problem in using Spark-Cassandra connector

2014-09-11 Thread Karunya Padala
I have created key space called EmailKeySpace’and table called Emails and inserted some data in the Cassandra. See my Cassandra console screen shot. [cid:image001.png@01CFCDEB.8FB55CB0] Regards, Karunya. From: Reddy Raja [mailto:areddyr...@gmail.com] Sent: 11 September 2014 18:07 To: Karunya

Spark on Raspberry Pi?

2014-09-11 Thread Sandeep Singh
Has anyone tried using Raspberry Pi for Spark? How efficient is it to use around 10 Pi's for local testing env ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Raspberry-Pi-tp13965.html Sent from the Apache Spark User List mailing list archive at

Re: How to scale more consumer to Kafka stream

2014-09-11 Thread Dibyendu Bhattacharya
I agree Gerard. Thanks for pointing this.. Dib On Thu, Sep 11, 2014 at 5:28 PM, Gerard Maas gerard.m...@gmail.com wrote: This pattern works. One note, thought: Use 'union' only if you need to group the data from all RDDs into one RDD for processing (like count distinct or need a groupby).

Fwd: Spark on Raspberry Pi?

2014-09-11 Thread Chen He
Pi's bus speed, memory size and access speed, and processing ability are limited. The only benefit could be the power consumption. On Thu, Sep 11, 2014 at 8:04 AM, Sandeep Singh sand...@techaddict.me wrote: Has anyone tried using Raspberry Pi for Spark? How efficient is it to use around 10

RE: JMXSink for YARN deployment

2014-09-11 Thread Shao, Saisai
Hi, I’m guessing the problem is that driver or executor cannot get the metrics.properties configuration file in the yarn container, so metrics system cannot load the right sinks. Thanks Jerry From: Vladimir Tretyakov [mailto:vladimir.tretya...@sematext.com] Sent: Thursday, September 11, 2014

unable to create new native thread

2014-09-11 Thread arthur.hk.c...@gmail.com
Hi I am trying the Spark sample program “SparkPi”, I got an error unable to create new native thread, how to resolve this? 14/09/11 21:36:16 INFO scheduler.DAGScheduler: Completed ResultTask(0, 644) 14/09/11 21:36:16 INFO scheduler.TaskSetManager: Finished TID 643 in 43 ms on node1 (progress:

Re: JMXSink for YARN deployment

2014-09-11 Thread Vladimir Tretyakov
Hi Shao, thx for explanation, any ideas how to fix it? Where should I put metrics.properties file? On Thu, Sep 11, 2014 at 4:18 PM, Shao, Saisai saisai.s...@intel.com wrote: Hi, I’m guessing the problem is that driver or executor cannot get the metrics.properties configuration file in the

RE: JMXSink for YARN deployment

2014-09-11 Thread Shao, Saisai
I think you can try to use ” spark.metrics.conf” to manually specify the path of metrics.properties, but the prerequisite is that each container should find this file in their local FS because this file is loaded locally. Besides I think this might be a kind of workaround, a better solution is

Re: Unpersist

2014-09-11 Thread Deep Pradhan
After every loop I want the temp variable to cease to exist On Thu, Sep 11, 2014 at 4:33 PM, Akhil Das ak...@sigmoidanalytics.com wrote: like this? var temp = ... for (i - num) { temp = .. { do something } temp.unpersist() } Thanks Best Regards On Thu, Sep 11, 2014

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
Hi, Can you attach more logs to see if there is some entry from ContextCleaner? I met very similar issue before…but haven’t get resolved Best, -- Nan Zhu On Thursday, September 11, 2014 at 10:13 AM, Dibyendu Bhattacharya wrote: Dear All, Not sure if this is a false alarm.

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
This is my case about broadcast variable: 14/07/21 19:49:13 INFO Executor: Running task ID 4 14/07/21 19:49:13 INFO DAGScheduler: Completed ResultTask(0, 2) 14/07/21 19:49:13 INFO TaskSetManager: Finished TID 2 in 95 ms on localhost (progress: 3/106) 14/07/21 19:49:13 INFO TableOutputFormat:

Re[2]: HBase 0.96+ with Spark 1.0+

2014-09-11 Thread spark
Hi guys, any luck with this issue, anyone? I aswell tried all the possible exclusion combos to a no avail. thanks for your ideas reinis -Original-Nachricht- Von: Stephen Boesch java...@gmail.com An: user user@spark.apache.org Datum: 28-06-2014 15:12 Betreff: Re: HBase 0.96+

Re: JMXSink for YARN deployment

2014-09-11 Thread Vladimir Tretyakov
Hi again, yeah , I've tried to use ” spark.metrics.conf” before my question in ML, had no luck:( Any other ideas from somebody? Seems nobody use metrics in YARN deployment mode. How about Mesos? I didn't try but maybe Spark has the same difficulties on Mesos? PS: Spark is great thing in general,

Re: JMXSink for YARN deployment

2014-09-11 Thread Kousuke Saruta
Hi Vladimir How about use --files option with spark-submit? - Kousuke (2014/09/11 23:43), Vladimir Tretyakov wrote: Hi again, yeah , I've tried to use ” spark.metrics.conf” before my question in ML, had no luck:( Any other ideas from somebody? Seems nobody use metrics in YARN deployment

Python execution support on clusters

2014-09-11 Thread david_allanus
Is there some doc that I missed that describes what execution engines Python is support for with Spark? If we use spark-submit, with a yarn cluster an error is produced saying 'Error: Cannot currently run Python driver programs on cluster'. Thanks in advance David -- View this message in

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread alexandria1101
Thank you!! I can do this using saveAsTable with the schemaRDD, right? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Table-not-found-using-jdbc-console-to-query-sparksql-hive-thriftserver-tp13840p13979.html Sent from the Apache Spark User List mailing

compiling spark source code

2014-09-11 Thread rapelly kartheek
HI, Can someone please tell me how to compile the spark source code to effect the changes in the source code. I was trying to ship the jars to all the slaves, but in vain. -Karthik

Out of memory with Spark Streaming

2014-09-11 Thread Aniket Bhatnagar
I am running a simple Spark Streaming program that pulls in data from Kinesis at a batch interval of 10 seconds, windows it for 10 seconds, maps data and persists to a store. The program is running in local mode right now and runs out of memory after a while. I am yet to investigate heap dumps

Re: efficient zipping of lots of RDDs

2014-09-11 Thread Mohit Jaggi
filed jira SPARK-3489 https://issues.apache.org/jira/browse/SPARK-3489 On Thu, Sep 4, 2014 at 9:36 AM, Mohit Jaggi mohitja...@gmail.com wrote: Folks, I sent an email announcing https://github.com/AyasdiOpenSource/df This dataframe is basically a map of RDDs of columns(along with DSL

Re: Setting up jvm in pyspark from shell

2014-09-11 Thread Davies Liu
The heap size of JVM can not been changed dynamically, so you need to config it before running pyspark. If you run it in local mode, you should config spark.driver.memory (in 1.1 or master). Or, you can use --driver-memory 2G (should work in 1.0+) On Wed, Sep 10, 2014 at 10:43 PM, Mohit Singh

Re: compiling spark source code

2014-09-11 Thread Daniil Osipov
In the spark source folder, execute `sbt/sbt assembly` On Thu, Sep 11, 2014 at 8:27 AM, rapelly kartheek kartheek.m...@gmail.com wrote: HI, Can someone please tell me how to compile the spark source code to effect the changes in the source code. I was trying to ship the jars to all the

Re: JMXSink for YARN deployment

2014-09-11 Thread Vladimir Tretyakov
Hi, Kousuke, Can you please explain a bit detailed what do you mean, I am new in Spark, looked at https://spark.apache.org/docs/latest/submitting-applications.html seems there is no '--files' option. I just have to add '--files /path-to-metrics.properties' ? Undocumented ability? Thx for

Re: Spark on Raspberry Pi?

2014-09-11 Thread Daniil Osipov
Limited memory could also cause you some problems and limit usability. If you're looking for a local testing environment, vagrant boxes may serve you much better. On Thu, Sep 11, 2014 at 6:18 AM, Chen He airb...@gmail.com wrote: Pi's bus speed, memory size and access speed, and processing

Re: Out of memory with Spark Streaming

2014-09-11 Thread Aniket Bhatnagar
I did change it to be 1 gb. It still ran out of memory but a little later. The streaming job isnt handling a lot of data. In every 2 seconds, it doesn't get more than 50 records. Each record size is not more than 500 bytes. On Sep 11, 2014 10:54 PM, Bharat Venkat bvenkat.sp...@gmail.com wrote:

Re: Spark on Raspberry Pi?

2014-09-11 Thread Aniket Bhatnagar
Just curiois... What's the use case you are looking to implement? On Sep 11, 2014 10:50 PM, Daniil Osipov daniil.osi...@shazam.com wrote: Limited memory could also cause you some problems and limit usability. If you're looking for a local testing environment, vagrant boxes may serve you much

Re: Spark on Raspberry Pi?

2014-09-11 Thread Chanwit Kaewkasi
We've found that Raspberry Pi is not enough for Hadoop/Spark mainly because the memory consumption. What we've built is a cluster form with 22 Cubieboards, each contains 1 GB RAM. Best regards, -chanwit -- Chanwit Kaewkasi linkedin.com/in/chanwit On Thu, Sep 11, 2014 at 8:04 PM, Sandeep Singh

Re: Spark SQL JDBC

2014-09-11 Thread alexandria1101
Even when I comment out those 3 lines, I still get the same error. Did someone solve this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-JDBC-tp11369p13992.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: spark on yarn history server + hdfs permissions issue

2014-09-11 Thread Greg Hill
To answer my own question, in case someone else runs into this. The spark user needs to be in the same group on the namenode, and hdfs caches that information for it seems like at least an hour. Magically started working on its own. Greg From: Greg

Re: Re[2]: HBase 0.96+ with Spark 1.0+

2014-09-11 Thread Aniket Bhatnagar
Dependency hell... My fav problem :). I had run into a similar issue with hbase and jetty. I cant remember thw exact fix, but is are excerpts from my dependencies that may be relevant: val hadoop2Common = org.apache.hadoop % hadoop-common % hadoop2Version excludeAll(

Network requirements between Driver, Master, and Slave

2014-09-11 Thread Jim Carroll
Hello all, I'm trying to run a Driver on my local network with a deployment on EC2 and it's not working. I was wondering if either the master or slave instances (in standalone) connect back to the driver program. I outlined the details of my observations in a previous post but here is what I'm

SparkSQL HiveContext TypeTag compile error

2014-09-11 Thread Du Li
Hi, I have the following code snippet. It works fine on spark-shell but in a standalone app it reports No TypeTag available for MySchema” at compile time when calling hc.createScheamaRdd(rdd). Anybody knows what might be missing? Thanks, Du -- Import org.apache.spark.sql.hive.HiveContext

Re: SchemaRDD saveToCassandra

2014-09-11 Thread Michael Armbrust
This might be a better question to ask on the cassandra mailing list as I believe that is where the exception is coming from. On Thu, Sep 11, 2014 at 2:37 AM, lmk lakshmi.muralikrish...@gmail.com wrote: Hi, My requirement is to extract certain fields from json files, run queries on them and

Reading from multiple sockets

2014-09-11 Thread Varad Joshi
Still fairly new to Spark so please bear with me. I am trying to write a streaming app that has multiple workers that read from sockets and process the data. Here is a very simplified version of what I am trying to do: val carStreamSeq = (1 to 2).map( _ = ssc.socketTextStream(host, port)

RE: cannot read file form a local path

2014-09-11 Thread Mozumder, Monir
I am seeing this same issue with Spark 1.0.1 (tried with file:// for local file ) : scala val lines = sc.textFile(file:///home/monir/.bashrc) lines: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at console:12 scala val linecount = lines.count

Re: Spark SQL and running parquet tables?

2014-09-11 Thread DanteSama
Michael Armbrust wrote You'll need to run parquetFile(path).registerTempTable(name) to refresh the table. I'm not seeing that function on SchemaRDD in 1.0.2, is there something I'm missing? SchemaRDD Scaladoc

Re: SparkSQL HiveContext TypeTag compile error

2014-09-11 Thread Du Li
Solved it. The problem occurred because the case class was defined within a test case in FunSuite. Moving the case class definition out of test fixed the problem. From: Du Li l...@yahoo-inc.com.INVALIDmailto:l...@yahoo-inc.com.INVALID Date: Thursday, September 11, 2014 at 11:25 AM To:

single worker vs multiple workers on each machine

2014-09-11 Thread Mike Sam
Hi There, I am new to Spark and I was wondering when you have so much memory on each machine of the cluster, is it better to run multiple workers with limited memory on each machine or is it better to run a single worker with access to the majority of the machine memory? If the answer is it

spark sql - create new_table as select * from table

2014-09-11 Thread jamborta
Hi, I am trying to create a new table from a select query as follows: CREATE TABLE IF NOT EXISTS new_table ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/test/new_table' AS select * from table this works in Hive, but in Spark SQL

Re: spark sql - create new_table as select * from table

2014-09-11 Thread Du Li
The implementation of SparkSQL is currently incomplete. You may try it out with HiveContext instead of SQLContext. On 9/11/14, 1:21 PM, jamborta jambo...@gmail.com wrote: Hi, I am trying to create a new table from a select query as follows: CREATE TABLE IF NOT EXISTS new_table ROW FORMAT

Re: SparkSQL HiveContext TypeTag compile error

2014-09-11 Thread Du Li
Just moving it out of test is not enough. Must move the case class definition to the top level. Otherwise it would report a runtime error of task not serializable when executing collect(). From: Du Li l...@yahoo-inc.com.INVALIDmailto:l...@yahoo-inc.com.INVALID Date: Thursday, September 11,

Re: spark sql - create new_table as select * from table

2014-09-11 Thread jamborta
thanks. this was actually using hivecontext. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-create-new-table-as-select-from-table-tp14006p14009.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: cannot read file form a local path

2014-09-11 Thread Mozumder, Monir
Seems starting spark-shell in local mode solves this. But still then it cannot recognize file beginning with a '.' MASTER=local[4] ./bin/spark-shell . scala val lineCount = sc.textFile(/home/monir/ref).count lineCount: Long = 68 scala val lineCount2 =

Re[2]: HBase 0.96+ with Spark 1.0+

2014-09-11 Thread spark
Thank you, Aniket for your hint! Alas, I am facing really hellish situation as it seems, because I have integration tests using BOTH spark and HBase (Minicluster). Thus I get either: class javax.servlet.ServletRegistration's signer information does not match signer information of other classes

Re: Out of memory with Spark Streaming

2014-09-11 Thread Tathagata Das
Which version of spark are you running? If you are running the latest one, then could try running not a window but a simple event count on every 2 second batch, and see if you are still running out of memory? TD On Thu, Sep 11, 2014 at 10:34 AM, Aniket Bhatnagar aniket.bhatna...@gmail.com

Re: Spark streaming stops computing while the receiver keeps running without any errors reported

2014-09-11 Thread Tathagata Das
This is very puzzling, given that this works in the local mode. Does running the kinesis example work with your spark-submit? https://github.com/apache/spark/blob/master/extras/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala The instructions are present

Re: Re[2]: HBase 0.96+ with Spark 1.0+

2014-09-11 Thread Sean Owen
This was already answered at the bottom of this same thread -- read below. On Thu, Sep 11, 2014 at 9:51 PM, sp...@orbit-x.de wrote: class javax.servlet.ServletRegistration's signer information does not match signer information of other classes in the same package java.lang.SecurityException:

SparkContext and multi threads

2014-09-11 Thread moon soo Lee
Hi, I'm trying to make spark work on multithreads java application. What i'm trying to do is, - Create a Single SparkContext - Create Multiple SparkILoop and SparkIMain - Inject created SparkContext into SparkIMain interpreter. Thread is created by every user request and take a SparkILoop and

Re: spark sql - create new_table as select * from table

2014-09-11 Thread Yin Huai
What is the schema of table? On Thu, Sep 11, 2014 at 4:30 PM, jamborta jambo...@gmail.com wrote: thanks. this was actually using hivecontext. -- View this message in context:

Re: Re: Spark SQL -- more than two tables for join

2014-09-11 Thread Yin Huai
1.0.1 does not have the support on outer joins (added in 1.1). Can you try 1.1 branch? On Wed, Sep 10, 2014 at 9:28 PM, boyingk...@163.com boyingk...@163.com wrote: Hi,michael : I think Arthur.hk.chan arthur.hk.c...@gmail.com isn't here now,I Can Show something: 1)my spark version is 1.0.1

Spark Streaming in 1 hour batch duration RDD files gets lost

2014-09-11 Thread Jeoffrey Lim
Hi, Our spark streaming app is configured to pull data from Kafka in 1 hour batch duration which performs aggregation of data by specific keys and store the related RDDs to HDFS in the transform phase. We have tried checkpoint of 7 days on the DStream of Kafka to ensure that the generated stream

Backwards RDD

2014-09-11 Thread Victor Tso-Guillen
Iterating an RDD gives you each partition in order of their split index. I'd like to be able to get each partition in reverse order, but I'm having difficultly implementing the compute() method. I thought I could do something like this: override def getDependencies: Seq[Dependency[_]] = {

Announcing Spark 1.1.0!

2014-09-11 Thread Patrick Wendell
I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 171 developers! This release brings operational and performance improvements in Spark core including a new

Configuring Spark for heterogenous hardware

2014-09-11 Thread Victor Tso-Guillen
So I have a bunch of hardware with different core and memory setups. Is there a way to do one of the following: 1. Express a ratio of cores to memory to retain. The spark worker config would represent all of the cores and all of the memory usable for any application, and the application would

History server: ERROR ReplayListenerBus: Exception in parsing Spark event log

2014-09-11 Thread SK
Hi, I am using Spark 1.0.2 on a mesos cluster. After I run my job, when I try to look at the detailed application stats using a history server@18080, the stats don't show up for some of the jobs even though the job completed successfully and the event logs are written to the log folder. The log

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Haopu Wang
I see the binary packages include hadoop 1, 2.3 and 2.4. Does Spark 1.1.0 support hadoop 2.5.0 at below address? http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Friday,

coalesce on SchemaRDD in pyspark

2014-09-11 Thread Brad Miller
Hi All, I'm having some trouble with the coalesce and repartition functions for SchemaRDD objects in pyspark. When I run: sqlCtx.jsonRDD(sc.parallelize(['{foo:bar}', '{foo:baz}'])).coalesce(1) I get this error: Py4JError: An error occurred while calling o94.coalesce. Trace:

Re: Announcing Spark 1.1.0!

2014-09-11 Thread Tobias Pfeiffer
Hi, On Fri, Sep 12, 2014 at 9:12 AM, Patrick Wendell pwend...@gmail.com wrote: I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 171 developers! Great,

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
I’m not sure if I’m completely answering your question here but I’m currently working (on OSX) with Hadoop 2.5 and I used the Spark 1.1 with Hadoop 2.4 without any issues. On September 11, 2014 at 18:11:46, Haopu Wang (hw...@qilinsoft.com) wrote: I see the binary packages include hadoop 1,

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Denny Lee
It sort of depends on the definition of efficiently.  From a work flow perspective I would agree but from an I/O perspective, wouldn’t there be the same multi-pass from the standpoint of the Hive context needing to push the data into HDFS?  Saying this, if you’re pushing the data into HDFS and

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
Please correct me if I’m wrong but I was under the impression as per the maven repositories that it was just to stay more in sync with the various version of Hadoop.  Looking at the latest documentation (https://spark.apache.org/docs/latest/building-with-maven.html), there are multiple Hadoop

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Haopu Wang
From the web page (https://spark.apache.org/docs/latest/building-with-maven.html) which is pointed out by you, it’s saying “Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you’ll need to build Spark against the specific HDFS version in your environment.”

Applications status missing when Spark HA(zookeeper) enabled

2014-09-11 Thread jason chen
Hi guys, I configured Spark with the configuration in spark-env.sh: export SPARK_DAEMON_JAVA_OPTS=-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=host1:2181,host2:2181,host3:2181 -Dspark.deploy.zookeeper.dir=/spark And I started spark-shell on one master host1(active):

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Denny Lee
Yes, atleast for my query scenarios, I have been able to use Spark 1.1 with Hadoop 2.4 against Hadoop 2.5.  Note, Hadoop 2.5 is considered a relatively minor release (http://hadoop.apache.org/releases.html#11+August%2C+2014%3A+Release+2.5.0+available) where Hadoop 2.4 and 2.3 were considered

RE: Announcing Spark 1.1.0!

2014-09-11 Thread Haopu Wang
Got it, thank you, Denny! From: Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 11:04 AM To: user@spark.apache.org; Haopu Wang; d...@spark.apache.org; Patrick Wendell Subject: RE: Announcing Spark 1.1.0! Yes, atleast for my query

Re: DistCP - Spark-based

2014-09-11 Thread Nicholas Chammas
I've created SPARK-3499 https://issues.apache.org/jira/browse/SPARK-3499 to track creating a Spark-based distcp utility. Nick On Tue, Aug 12, 2014 at 4:20 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Good question; I don't know of one but I believe people at Cloudera had some thoughts of

Re: Spark SQL JDBC

2014-09-11 Thread Denny Lee
When you re-ran sbt did you clear out the packages first and ensure that the datanucleus jars were generated within lib_managed? I remembered having to do that when I was working testing out different configs. On Thu, Sep 11, 2014 at 10:50 AM, alexandria1101 alexandria.shea...@gmail.com wrote:

Re: Spark SQL Thrift JDBC server deployment for production

2014-09-11 Thread Denny Lee
Could you provide some context about running this in yarn-cluster mode? The Thrift server that's included within Spark 1.1 is based on Hive 0.12. Hive has been able to work against YARN since Hive 0.10. So when you start the thrift server, provided you copied the hive-site.xml over to the Spark

Re: Announcing Spark 1.1.0!

2014-09-11 Thread Tim Smith
Thanks for all the good work. Very excited about seeing more features and better stability in the framework. On Thu, Sep 11, 2014 at 5:12 PM, Patrick Wendell pwend...@gmail.com wrote: I am happy to announce the availability of Spark 1.1.0! Spark 1.1.0 is the second release on the

Re: Announcing Spark 1.1.0!

2014-09-11 Thread Matei Zaharia
Thanks to everyone who contributed to implementing and testing this release! Matei On September 11, 2014 at 11:52:43 PM, Tim Smith (secs...@gmail.com) wrote: Thanks for all the good work. Very excited about seeing more features and better stability in the framework. On Thu, Sep 11, 2014 at

Re: compiling spark source code

2014-09-11 Thread rapelly kartheek
I have been doing that. All the modifications to the code are not being compiled. On Thu, Sep 11, 2014 at 10:45 PM, Daniil Osipov daniil.osi...@shazam.com wrote: In the spark source folder, execute `sbt/sbt assembly` On Thu, Sep 11, 2014 at 8:27 AM, rapelly kartheek kartheek.m...@gmail.com

RE: Spark SQL JDBC

2014-09-11 Thread Cheng, Hao
I copied the 3 datanucleus jars (datanucleus-api-jdo-3.2.1.jar, datanucleus-core-3.2.2.jar, datanucleus-rdbms-3.2.1.jar) to the fold lib/ manually, and it works for me. From: Denny Lee [mailto:denny.g@gmail.com] Sent: Friday, September 12, 2014 11:28 AM To: alexandria1101 Cc:

Re: Table not found: using jdbc console to query sparksql hive thriftserver

2014-09-11 Thread Du Li
SchemaRDD has a method insertInto(table). When the table is partitioned, it would be more sensible and convenient to extend it with a list of partition key and values. From: Denny Lee denny.g@gmail.commailto:denny.g@gmail.com Date: Thursday, September 11, 2014 at 6:39 PM To: Du Li