Re: How to set local property in beeline connect to the spark thrift server

2014-12-31 Thread
Hi, Xiaoyu!

You can use `spark.sql.thriftserver.scheduler.pool` instead of
`spark.scheduler.pool` only in spark thrift server.



On Wed, Dec 31, 2014 at 3:55 PM, Xiaoyu Wang wangxy...@gmail.com wrote:

 Hi all!

 I use Spark SQL1.2 start the thrift server on yarn.

 I want to use fair scheduler in the thrift server.

 I set the properties in spark-defaults.conf like this:
 spark.scheduler.mode FAIR
 spark.scheduler.allocation.file
 /opt/spark-1.2.0-bin-2.4.1/conf/fairscheduler.xml

 In the thrift server UI can see the scheduler pool is ok.
 [image: 内嵌图片 1]

 How can I specify one sql job to the test pool.
 By default the sql job run in the default pool.

 In the http://spark.apache.org/docs/latest/job-scheduling.html document
 I see sc.setLocalProperty(spark.scheduler.pool, pool1) can be set in
 the code.

 In the beeline I execute set spark.scheduler.pool=test, but no use.
 But how can I set the local property in the beeline?




Got NotSerializableException when access broadcast variable

2014-08-20 Thread
Hi everyone!

I got a exception when i run my script with spark-shell:

I added 

SPARK_JAVA_OPTS=-Dsun.io.serialization.extendedDebugInfo=true

in spark-env.sh to show the following stack:


org.apache.spark.SparkException: Task not serializable
at 
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
at org.apache.spark.rdd.RDD.filter(RDD.scala:282)
at org.apache.spark.sql.SchemaRDD.filter(SchemaRDD.scala:460)
at $iwC$$iwC$$iwC$$iwC.init(console:18)
at $iwC$$iwC$$iwC.init(console:23)
at $iwC$$iwC.init(console:25)
at $iwC.init(console:27)
at init(console:29)
at .init(console:33)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
……
Caused by: java.io.NotSerializableException: 
org.apache.spark.sql.hive.HiveContext$$anon$3
- field (class org.apache.spark.sql.hive.HiveContext, name: 
functionRegistry, type: class 
org.apache.spark.sql.hive.HiveFunctionRegistry)
- object (class org.apache.spark.sql.hive.HiveContext, 
org.apache.spark.sql.hive.HiveContext@4648e685)
- field (class $iwC$$iwC$$iwC$$iwC, name: hc, type: class 
org.apache.spark.sql.hive.HiveContext)
- object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@23d652ef)
- field (class $iwC$$iwC$$iwC, name: $iw, type: class 
$iwC$$iwC$$iwC$$iwC)
- object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@71cc14f1)
- field (class $iwC$$iwC, name: $iw, type: class $iwC$$iwC$$iwC)
- object (class $iwC$$iwC, $iwC$$iwC@74eca89e)
- field (class $iwC, name: $iw, type: class $iwC$$iwC)
- object (class $iwC, $iwC@685c4cc4)
- field (class $line9.$read, name: $iw, type: class $iwC)
- object (class $line9.$read, $line9.$read@519f9aae)
- field (class $iwC$$iwC$$iwC, name: $VAL7, type: class 
$line9.$read)
- object (class $iwC$$iwC$$iwC, $iwC$$iwC$$iwC@4b996858)
- field (class $iwC$$iwC$$iwC$$iwC, name: $outer, type: class 
$iwC$$iwC$$iwC)
- object (class $iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC@31d646d4)
- field (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: 
class $iwC$$iwC$$iwC$$iwC)
- root object (class $iwC$$iwC$$iwC$$iwC$$anonfun$1, function1)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1528)

I write some simple script to reproduce this problem.

case 1 :
val barr1 = sc.broadcast(test)
val sret = sc.parallelize(1 to 10, 2)
val ret = sret.filter(row = !barr1.equals(test))
ret.collect.foreach(println)

It’s working fine with local mode and yarn-client mode.

case 2 :
val barr1 = sc.broadcast(test)
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val sret = hc.sql(show tables)
val ret = sret.filter(row = !barr1.equals(test))
ret.collect.foreach(println)

It will throw java.io.NotSerializableException: 
org.apache.spark.sql.hive.HiveContext
 with local mode and yarn-client mode

But it working fine if I write the same code in a scala file and run in 
Intellij IDEA.

import org.apache.spark.{SparkConf, SparkContext}

object TestBroadcast2 {
  def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName(Broadcast 
Test).setMaster(local[3])
val sc = new SparkContext(sparkConf)
val barr1 = sc.broadcast(test)
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
val sret = hc.sql(show tables)
val ret = sret.filter(row = !barr1.equals(test))
ret.collect.foreach(println)
  }
}







Re: Spark Installation

2014-07-09 Thread
Hi Srikrishna

the reason to this issue is you had uploaded assembly jar to HDFS twice.

paste your command could be better diagnosis



田毅
===
橘云平台产品线
大数据产品部
亚信联创科技(中国)有限公司
手机:13910177261
电话:010-82166322
传真:010-82166617
Q Q:20057509
MSN:yi.t...@hotmail.com
地址:北京市海淀区东北旺西路10号院东区  亚信联创大厦


===

在 2014年7月9日,上午3:03,Srikrishna S srikrishna...@gmail.com 写道:

 Hi All,
 
 
 I tried the make distribution script and it worked well. I was able to
 compile the spark binary on our CDH5 cluster. Once I compiled Spark, I
 copied over the binaries in the dist folder to all the other machines
 in the cluster.
 
 However, I run into an issue while submit a job in yarn-client mode. I
 get an error message that says the following
 Resource 
 file:/opt/spark/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar
 changed on src filesystem (expected 1404845211000, was 1404845404000)
 
 My end goal is to submit a job (that uses MLLib) in our Yarn cluster.
 
 Any thoughts anyone?
 
 Regards,
 Krishna
 
 
 
 On Tue, Jul 8, 2014 at 9:49 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 
 Hi Srikrishna,
 
 The binaries are built with something like
 mvn package -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1 
 -Dyarn.version=2.3.0-cdh5.0.1
 
 -Sandy
 
 
 On Tue, Jul 8, 2014 at 3:14 AM, 田毅 tia...@asiainfo.com wrote:
 
 try this command:
 
 make-distribution.sh --hadoop 2.3.0-cdh5.0.0 --with-yarn --with-hive
 
 
 
 
 田毅
 ===
 橘云平台产品线
 大数据产品部
 亚信联创科技(中国)有限公司
 手机:13910177261
 电话:010-82166322
 传真:010-82166617
 Q Q:20057509
 MSN:yi.t...@hotmail.com
 地址:北京市海淀区东北旺西路10号院东区  亚信联创大厦
 
 
 ===
 
 在 2014年7月8日,上午11:53,Krishna Sankar ksanka...@gmail.com 写道:
 
 Couldn't find any reference of CDH in pom.xml - profiles or the 
 hadoop.version.Am also wondering how the cdh compatible artifact was 
 compiled.
 Cheers
 k/
 
 
 On Mon, Jul 7, 2014 at 8:07 PM, Srikrishna S srikrishna...@gmail.com 
 wrote:
 
 Hi All,
 
 Does anyone know what the command line arguments to mvn are to generate 
 the pre-built binary for spark on Hadoop 2-CHD5.
 
 I would like to pull in a recent bug fix in spark-master and rebuild the 
 binaries in the exact same way that was used for that provided on the 
 website.
 
 I have tried the following:
 
 mvn install -Pyarn -Dhadoop.version=2.3.0-cdh5.0.1
 
 And it doesn't quite work.
 
 Any thoughts anyone?
 
 
 
 
 



Re: Shark Vs Spark SQL

2014-07-03 Thread
add MASTER=yarn-client then the JDBC / Thrift server will run on yarn



2014-07-02 16:57 GMT-07:00 田毅 tia...@asiainfo.com:

 hi, Matei


 Do you know how to run the JDBC / Thrift server on Yarn?


 I did not find any suggestion in docs.


 2014-07-02 16:06 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com:

 Spark SQL in Spark 1.1 will include all the functionality in Shark; take a
 look at
 http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html.
 We decided to do this because at the end of the day, the only code left in
 Shark was the JDBC / Thrift server, which is a very small amount of code.
 There’s also a branch of Spark 1.0 that includes this server if you want to
 replace Shark on Spark 1.0:
 https://github.com/apache/spark/tree/branch-1.0-jdbc. The server runs in
 a very similar way to how Shark did.

 Matei

 On Jul 2, 2014, at 3:57 PM, Shrikar archak shrika...@gmail.com wrote:

 As of the spark summit 2014 they mentioned that there will be no active
 development on shark.

 Thanks,
 Shrikar


  On Wed, Jul 2, 2014 at 3:53 PM, Subacini B subac...@gmail.com wrote:

 Hi,


 http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E

 This talks about  Shark backend will be replaced with Spark SQL engine
 in future.
 Does that mean Spark will continue to support Shark + Spark SQL for long
 term? OR
 After some period, Shark will be decommissioned ??

 Thanks
 Subacini







Is it possible to run HiveThriftServer2 based on SparkSQL in YARN now?

2014-07-02 Thread
Hi, everyone!

Is it possible to run HiveThriftServer2 based on SparkSQL in YARN now?

Spark version: branch 1.0-jdbc
YARN version: 2.3.0-cdh5.0.0


Re: Shark Vs Spark SQL

2014-07-02 Thread
hi, Matei


Do you know how to run the JDBC / Thrift server on Yarn?


I did not find any suggestion in docs.


2014-07-02 16:06 GMT-07:00 Matei Zaharia matei.zaha...@gmail.com:

 Spark SQL in Spark 1.1 will include all the functionality in Shark; take a
 look at
 http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html.
 We decided to do this because at the end of the day, the only code left in
 Shark was the JDBC / Thrift server, which is a very small amount of code.
 There’s also a branch of Spark 1.0 that includes this server if you want to
 replace Shark on Spark 1.0:
 https://github.com/apache/spark/tree/branch-1.0-jdbc. The server runs in
 a very similar way to how Shark did.

 Matei

 On Jul 2, 2014, at 3:57 PM, Shrikar archak shrika...@gmail.com wrote:

 As of the spark summit 2014 they mentioned that there will be no active
 development on shark.

 Thanks,
 Shrikar


 On Wed, Jul 2, 2014 at 3:53 PM, Subacini B subac...@gmail.com wrote:

 Hi,


 http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E

 This talks about  Shark backend will be replaced with Spark SQL engine in
 future.
 Does that mean Spark will continue to support Shark + Spark SQL for long
 term? OR
 After some period, Shark will be decommissioned ??

 Thanks
 Subacini