[jira] [Commented] (SPARK-22113) Dataset shows in Hive is inconsistent with JDBC

Michael Fu (JIRA) Mon, 25 Sep 2017 23:44:09 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-22113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180330#comment-16180330
 ]


Michael Fu commented on SPARK-22113:
------------------------------------

[~viirya] Ok, This becomes interesting.... I did another test according to your 
case which prefix the column with table name. But failed by an 
_org.apache.spark.sql.AnalysisException_
{code:java}
ds.select("dwd_dim_date.dw_date").limit(10).show();
{code}

{code:java}
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot 
resolve '`dwd_dim_date.dw_date`' given input columns: [week_of_year, dw_month, 
last_12_full_month, day_of_month, check_week, dw_quarter, season_year, 
festival, etl_time, check_month, dw_date, last_full_month, dw_week, dw_date1, 
mth_year, dw_date2, week_year, last_6_full_month, day_of_year, dw_year];;
'Project ['dwd_dim_date.dw_date]
+- 
Relation[dw_date#25,dw_date1#26,dw_date2#27,dw_year#28,dw_month#29,day_of_month#30,day_of_year#31,dw_week#32,week_of_year#33,dw_quarter#34,festival#35,last_full_month#36,last_6_full_month#37,last_12_full_month#38,etl_time#39,check_month#40,check_week#41,week_year#42,mth_year#43,season_year#44]
 JDBCRelation(tfdw.dwd_dim_date) [numPartitions=1]
{code}
Hive version

{code:java}
Hive 1.1.0-cdh5.11.1
{code}
Probably Hive version causes the difference. Anyhow, I will avoid using JDBC 
with Hive since we don't know whether spark would support this(And it's not 
recommended). So I close this issue.


> Dataset shows in Hive is inconsistent with JDBC
> -----------------------------------------------
>
>                 Key: SPARK-22113
>                 URL: https://issues.apache.org/jira/browse/SPARK-22113
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>         Environment: version 2.2.0
>            Reporter: Michael Fu
>
> I am trying to query data from Hive in spark. According spark-sql document, 
> there're two ways to do this:
> The first way is Init session with _enableHiveSupport_
> {code:java}
> SparkSession session = 
> SparkSession.builder().enableHiveSupport().getOrCreate();
> session.sql("select dw_date from tfdw.dwd_dim_date limit 10").show();
> {code}
> the dataset shows the correct result
> !https://i.stack.imgur.com/gBJCj.png!
> The second way is through JDBC
> {code:java}
> Dataset<Row> ds = session.read()
>                   .format("jdbc")
>                   .option("driver", "org.apache.hive.jdbc.HiveDriver")
>                   .option("url", 
> "jdbc:hive2://iZ11syxr6afZ:21050/;auth=noSasl")
>                   .option("dbtable", "tfdw.dwd_dim_date")
>                   .load();
> ds.select("dw_date").limit(10).show();
> {code}
> But the dataset only show the column name in the result rather than the data 
> in the column
> !https://i.stack.imgur.com/FBMDN.png!
> The two pictures should be consistent I think. Any outstanding I missed ? 
> Many thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-22113) Dataset shows in Hive is inconsistent with JDBC

Reply via email to