Matthew Walton created SPARK-21179:
--------------------------------------

             Summary: Unable to return Hive INT data type into Spark SQL via 
Hive JDBC driver:  Caused by: java.sql.SQLDataException: [Simba][JDBC](10140) 
Error converting value to int.  
                 Key: SPARK-21179
                 URL: https://issues.apache.org/jira/browse/SPARK-21179
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell, SQL
    Affects Versions: 2.0.0, 1.6.0
         Environment: OS:  Linux
HDP version 2.5.0.1-60
Hive version: 1.2.1
Spark  version 2.0.0.2.5.0.1-60
JDBC:  Download the latest Hortonworks JDBC driver
            Reporter: Matthew Walton


I'm trying to fetch back data in Spark SQL using a JDBC connection to Hive.  
Unfortunately, when I try to query data that resides in an INT column I get the 
following error:  

17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.  

Steps to reproduce:

1) On Hive create a simple table with an INT column and insert some data (I 
used SQuirreL Client with the Hortonworks JDBC driver):

create table wh2.hivespark (country_id int, country_name string)
insert into wh2.hivespark values (1, 'USA')

2) Copy the Hortonworks Hive JDBC driver to the machine where you will run 
Spark Shell

3) Start Spark shell loading the Hortonworks Hive JDBC driver jar files

./spark-shell --jars 
/home/spark/jdbc/hortonworkshive/HiveJDBC41.jar,/home/spark/jdbc/hortonworkshive/TCLIServiceClient.jar,/home/spark/jdbc/hortonworkshive/commons-codec-1.3.jar,/home/spark/jdbc/hortonworkshive/commons-logging-1.1.1.jar,/home/spark/jdbc/hortonworkshive/hive_metastore.jar,/home/spark/jdbc/hortonworkshive/hive_service.jar,/home/spark/jdbc/hortonworkshive/httpclient-4.1.3.jar,/home/spark/jdbc/hortonworkshive/httpcore-4.1.3.jar,/home/spark/jdbc/hortonworkshive/libfb303-0.9.0.jar,/home/spark/jdbc/hortonworkshive/libthrift-0.9.0.jar,/home/spark/jdbc/hortonworkshive/log4j-1.2.14.jar,/home/spark/jdbc/hortonworkshive/ql.jar,/home/spark/jdbc/hortonworkshive/slf4j-api-1.5.11.jar,/home/spark/jdbc/hortonworkshive/slf4j-log4j12-1.5.11.jar,/home/spark/jdbc/hortonworkshive/zookeeper-3.4.6.jar

4) In Spark shell load the data from Hive using the JDBC driver

val hivespark = spark.read.format("jdbc").options(Map("url" -> 
"jdbc:hive2://localhost:10000/wh2;AuthMech=3;UseNativeQuery=1;user=hfs;password=hdfs","dbtable"
 -> 
"wh2.hivespark")).option("driver","com.simba.hive.jdbc41.HS2Driver").option("user","hdfs").option("password","hdfs").load()

5) In Spark shell try to display the data

hivespark.show()

At this point you should see the error:

scala> hivespark.show()
17/06/22 12:14:37 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.sql.SQLDataException: [Simba][JDBC](10140) Error converting value to int.
        at 
com.simba.hiveserver2.exceptions.ExceptionConverter.toSQLException(Unknown 
Source)
        at 
com.simba.hiveserver2.utilities.conversion.TypeConverter.toInt(Unknown Source)
        at com.simba.hiveserver2.jdbc.common.SForwardResultSet.getInt(Unknown 
Source)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.getNext(JDBCRDD.scala:437)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.hasNext(JDBCRDD.scala:535)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
        at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
        at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
        at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:784)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:85)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Note:  I also tested this issue using a JDBC driver from Progress DataDirect 
and I see a similar error message so this does not seem to be driver specific.

scala> hivespark.show()
17/06/22 12:07:59 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.sql.SQLException: [DataDirect][Hive JDBC Driver]Value can not be converted 
to requested type.

Also, if I query this table directly from SQuirreL Client tool there is no 
error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to