[jira] [Commented] (SPARK-21063) Spark return an empty result from remote hadoop cluster

Paul Staab (JIRA) Tue, 19 Jun 2018 04:59:19 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516990#comment-16516990
 ]


Paul Staab commented on SPARK-21063:
------------------------------------

I was able to find a workaround for this problem on Spark 2.1.0:

 

1. Create an Hive Dialect which uses the correct quotes for escaping the column 
names:
{code:java}
object HiveDialect extends JdbcDialect with Logging {

  override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2")

  override def quoteIdentifier(colName: String): String = s"`$colName`"
  }
}
{code}
This is taken from https://github.com/apache/spark/pull/19238

 

2. Register it before making the call with spark.read.jdbc
{code:java}
JdbcDialects.registerDialect(HiveDialect)
{code}
3. Execute spark.read.jdbc with fetchsize option
{code:java}
spark.read.jdbc("jdbc:hive2://localhost:10000/default","test1",
          properties={"driver": "org.apache.hive.jdbc.HiveDriver", "fetchsize": 
"10"}).show()
{code}
It only works when registering the dialect and using fetchsize. There was a 
merge request for adding the dialect to spark by default

[https://github.com/apache/spark/pull/19238]

but unfortunately it was not merged.

> Spark return an empty result from remote hadoop cluster
> -------------------------------------------------------
>
>                 Key: SPARK-21063
>                 URL: https://issues.apache.org/jira/browse/SPARK-21063
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.1.0, 2.1.1
>            Reporter: Peter Bykov
>            Priority: Major
>
> Spark returning empty result from when querying remote hadoop cluster.
> All firewall settings removed.
> Querying using JDBC working properly using hive-jdbc driver from version 1.1.1
> Code snippet is:
> {code:java}
> val spark = SparkSession.builder
>     .appName("RemoteSparkTest")
>     .master("local")
>     .getOrCreate()
> val df = spark.read
>   .option("url", "jdbc:hive2://remote.hive.local:10000/default")
>   .option("user", "user")
>   .option("password", "pass")
>   .option("dbtable", "test_table")
>   .option("driver", "org.apache.hive.jdbc.HiveDriver")
>   .format("jdbc")
>   .load()
>  
> df.show()
> {code}
> Result:
> {noformat}
> +-------------------+
> |test_table.test_col|
> +-------------------+
> +-------------------+
> {noformat}
> All manipulations like: 
> {code:java}
> df.select(*).show()
> {code}
> returns empty result too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-21063) Spark return an empty result from remote hadoop cluster

Reply via email to