[
https://issues.apache.org/jira/browse/SPARK-21063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516990#comment-16516990
]
Paul Staab commented on SPARK-21063:
------------------------------------
I was able to find a workaround for this problem on Spark 2.1.0:
1. Create an Hive Dialect which uses the correct quotes for escaping the column
names:
{code:java}
object HiveDialect extends JdbcDialect with Logging {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:hive2")
override def quoteIdentifier(colName: String): String = s"`$colName`"
}
}
{code}
This is taken from https://github.com/apache/spark/pull/19238
2. Register it before making the call with spark.read.jdbc
{code:java}
JdbcDialects.registerDialect(HiveDialect)
{code}
3. Execute spark.read.jdbc with fetchsize option
{code:java}
spark.read.jdbc("jdbc:hive2://localhost:10000/default","test1",
properties={"driver": "org.apache.hive.jdbc.HiveDriver", "fetchsize":
"10"}).show()
{code}
It only works when registering the dialect and using fetchsize. There was a
merge request for adding the dialect to spark by default
[https://github.com/apache/spark/pull/19238]
but unfortunately it was not merged.
> Spark return an empty result from remote hadoop cluster
> -------------------------------------------------------
>
> Key: SPARK-21063
> URL: https://issues.apache.org/jira/browse/SPARK-21063
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, SQL
> Affects Versions: 2.1.0, 2.1.1
> Reporter: Peter Bykov
> Priority: Major
>
> Spark returning empty result from when querying remote hadoop cluster.
> All firewall settings removed.
> Querying using JDBC working properly using hive-jdbc driver from version 1.1.1
> Code snippet is:
> {code:java}
> val spark = SparkSession.builder
> .appName("RemoteSparkTest")
> .master("local")
> .getOrCreate()
> val df = spark.read
> .option("url", "jdbc:hive2://remote.hive.local:10000/default")
> .option("user", "user")
> .option("password", "pass")
> .option("dbtable", "test_table")
> .option("driver", "org.apache.hive.jdbc.HiveDriver")
> .format("jdbc")
> .load()
>
> df.show()
> {code}
> Result:
> {noformat}
> +-------------------+
> |test_table.test_col|
> +-------------------+
> +-------------------+
> {noformat}
> All manipulations like:
> {code:java}
> df.select(*).show()
> {code}
> returns empty result too.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]