Re: Spark-SQL JDBC driver

Cheng Lian Tue, 09 Dec 2014 06:45:13 -0800

According to the stacktrace, you were still using SQLContext rather thanHiveContext. To interact with Hive, HiveContext *must* be used.

Please refer to this pagehttp://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables



On 12/9/14 6:26 PM, Anas Mosaad wrote:

Back to the first question,**this will mandate that hive is up andrunning?
When I try it, I get the following exception. The documentation saysthat this method works only on SchemaRDD. I though thatcountries.saveAsTable did not work for that a reason so I created atmp that contains the results from the registered temp table. Which Icould validate that it's a SchemaRDD as shown below.
*
@Judy,* I do really appreciate your kind support and I want tounderstand and off course don't want to wast your time. If you candirect me the documentation describing this details, this will be great.
scala> val tmp = sqlContext.sql("select * from countries")

tmp: org.apache.spark.sql.SchemaRDD =

SchemaRDD[12] at RDD at SchemaRDD.scala:108

== Query Plan ==

== Physical Plan ==
PhysicalRDD[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
scala> tmp.saveAsTable("Countries")
org.apache.spark.sql.catalyst.errors.package$TreeNodeException:Unresolved plan found, tree:
'CreateTableAsSelect None, Countries, false, None
Project[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29]
  Subquery countries
LogicalRDD[COUNTRY_ID#20,COUNTRY_ISO_CODE#21,COUNTRY_NAME#22,COUNTRY_SUBREGION#23,COUNTRY_SUBREGION_ID#24,COUNTRY_REGION#25,COUNTRY_REGION_ID#26,COUNTRY_TOTAL#27,COUNTRY_TOTAL_ID#28,COUNTRY_NAME_HIST#29],MapPartitionsRDD[9] at mapPartitions at ExistingRDD.scala:36
atorg.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:83)
atorg.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
atorg.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
atorg.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
atorg.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
atorg.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
atorg.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
atorg.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
atscala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
atscala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
atorg.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
atorg.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
atorg.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
atorg.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
atorg.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
atorg.apache.spark.sql.SQLContext$QueryExecution.withCachedData$lzycompute(SQLContext.scala:412)
atorg.apache.spark.sql.SQLContext$QueryExecution.withCachedData(SQLContext.scala:412)
atorg.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:413)
atorg.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:413)
atorg.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
atorg.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
atorg.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
atorg.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
atorg.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
atorg.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
atorg.apache.spark.sql.SchemaRDDLike$class.saveAsTable(SchemaRDDLike.scala:126)
at org.apache.spark.sql.SchemaRDD.saveAsTable(SchemaRDD.scala:108)

at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)

at $iwC$$iwC$$iwC$$iwC.<init>(<console>:27)

at $iwC$$iwC$$iwC.<init>(<console>:29)

at $iwC$$iwC.<init>(<console>:31)

at $iwC.<init>(<console>:33)

at <init>(<console>:35)

at .<init>(<console>:39)

at .<clinit>(<console>)

at .<init>(<console>:7)

at .<clinit>(<console>)

at $print(<console>)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
atorg.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
atorg.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)

at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
atorg.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
atorg.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)

at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)

at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)

at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
atorg.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
atorg.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
atorg.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
atscala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)

at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)

at org.apache.spark.repl.Main$.main(Main.scala:31)

at org.apache.spark.repl.Main.main(Main.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:365)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
On Tue, Dec 9, 2014 at 11:44 AM, Cheng Lian <lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:
    How did you register the table under spark-shell? Two things to
    notice:

    1. To interact with Hive, HiveContext instead of SQLContext must
    be used.
    2. `registerTempTable` doesn't persist the table into Hive
    metastore, and the table is lost after quitting spark-shell.
    Instead, you must use `saveAsTable`.


    On 12/9/14 5:27 PM, Anas Mosaad wrote:
    Thanks Cheng,

    I thought spark-sql is using the same exact metastore, right?
    However, it didn't work as expected. Here's what I did.

    In spark-shell, I loaded a csv files and registered the table,
    say countries.
    Started the thrift server.
    Connected using beeline. When I run show tables or !tables, I get
    empty list of tables as follow:

        /0: jdbc:hive2://localhost:10000> !tables/

        /+------------+--------------+-------------+-------------+----------+/

        /| TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE |
        REMARKS  |/

        /+------------+--------------+-------------+-------------+----------+/

        /+------------+--------------+-------------+-------------+----------+/

        /0: jdbc:hive2://localhost:10000> show tables ;/

        /+---------+/

        /| result  |/

        /+---------+/

        /+---------+/

        /No rows selected (0.106 seconds)/

        /0: jdbc:hive2://localhost:10000> /



    Kindly advice, what am I missing? I want to read the RDD using
    SQL from outside spark-shell (i.e. like any other relational
    database)


    On Tue, Dec 9, 2014 at 11:05 AM, Cheng Lian
    <lian.cs....@gmail.com <mailto:lian.cs....@gmail.com>> wrote:

        Essentially, the Spark SQL JDBC Thrift server is just a Spark
        port of HiveServer2. You don't need to run Hive, but you do
        need a working Metastore.


        On 12/9/14 3:59 PM, Anas Mosaad wrote:
        Thanks Judy, this is exactly what I'm looking for. However,
        and plz forgive me if it's a dump question is: It seems to
        me that thrift is the same as hive2 JDBC driver, does this
        mean that starting thrift will start hive as well on the server?

        On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash
        <judyn...@exchange.microsoft.com
        <mailto:judyn...@exchange.microsoft.com>> wrote:

            You can use thrift server for this purpose then test it
            with beeline.

            See doc:

            
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server

            *From:*Anas Mosaad [mailto:anas.mos...@incorta.com
            <mailto:anas.mos...@incorta.com>]
            *Sent:* Monday, December 8, 2014 11:01 AM
            *To:* user@spark.apache.org <mailto:user@spark.apache.org>
            *Subject:* Spark-SQL JDBC driver

            Hello Everyone,

            I'm brand new to spark and was wondering if there's a
            JDBC driver to access spark-SQL directly. I'm running
            spark in standalone mode and don't have hadoop in this
            environment.
--
            *Best Regards/أطيب المنى,*

            *Anas Mosaad*
--
        *Best Regards/أطيب المنى,*
        *
        *
        *Anas Mosaad*
        *Incorta Inc.*
        *+20-100-743-4510*
--
    *Best Regards/أطيب المنى,*
    *
    *
    *Anas Mosaad*
    *Incorta Inc.*
    *+20-100-743-4510*
--

*Best Regards/أطيب المنى,*
*
*
*Anas Mosaad*
*Incorta Inc.*
*+20-100-743-4510*

Re: Spark-SQL JDBC driver

Reply via email to