[
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009033#comment-15009033
]
Cheng Lian commented on SPARK-9686:
-----------------------------------
Tested 1.7-SNAPSHOT
([fa13301|https://github.com/apache/spark/commit/fa13301ae440c4c9594280f236bcca11b62fdd29])
under several different configurations using Hive 1.2.1 and a small JDBC
testing program (attached at the end).
# Embedded metastore
Remove {{conf/hive-site.xml}}, start Thrift server using
{{./sbin/start-thriftserver.sh}}, and execute the test program
#- {{getSchemas()}} only returns {{default}}
#- {{getColumns}} returns nothing.
# Local metastore
Configuring {{conf/hive-site.xml}} to point to a local PostgreSQL backed Hive
1.2.1 metastore.
Leave {{hive.metastore.uris}} empty (i.e. disabling remote metastore). Start
Thrift server and execute the test program.
#- {{getSchemas()}} only returns {{default}}
#- {{getColumns}} returns nothing.
# Remote metastore
Configuring {{conf/hive-site.xml}} to point to a remote PostgreSQL backed
Hive 1.2.1 metastore.
Set {{hive.metastore.uris}} to {{thrift://localhost:9083}}. Start metastore
service using {{$HIVE_HOME/bin/hive --service metastore}}, start Thrift server,
and execute the test program.
#- {{getSchemas()}} returns all defined databases.
#- {{getColumns}} returns columns defined in all tables.
However, it doesn't imply that using remote metastore works around this issue.
After some investigation, I think there are two separate but related issues:
# In {{HiveThriftServer2}}, although all SQL commands are dispatched to
metadata Hive client and execution Hive client properly, conventional JDBC
calls are still using the default {{HiveServer2}} implementation (e.g.
{{getSchemas()}} is handled by
{{o.a.hive.service.cli.CLIService.getSchemas()}}). These calls are not
dispatched and are always executed by execution Hive client, which points to
the dummy local Derby metastore.
We should override corresponding methods in {{SparkSQLCLIService}} and
dispatch these JDBC calls to the metastore Hive client.
# When using remote metastore, execution Hive client somehow is initialized to
point to the actual remote metastore instead of the dummy local Derby metastore.
I haven't figured out the root cause, but single-step debugging shows that
the execution Hive client does point to the remote metastore. My guess is that,
{{hive.metastore.uris}} takes a high precedence than
{{javax.jdo.option.ConnectionURL}}, and overrides the latter when a {{Hive}}
object is being initialized.
It's because of this issue that the 3rd test mentioned above shows the
correct answer. This issue can be steadily reproduced on my local machine.
However, according to [~navis]'s comment, remote metastore didn't work for him
either, probably because of other environmental factors.
Filing a separate JIRA ticket for this one.
The JDBC testing program:
{code}
import java.sql.DriverManager
object JDBCExperiments {
def main(args: Array[String]) {
val url = "jdbc:hive2://localhost:10000/default"
val username = "lian"
val password = ""
try {
Class.forName("org.apache.hive.jdbc.HiveDriver")
val connection = DriverManager.getConnection(url, username, password)
val metadata = connection.getMetaData
val schema = metadata.getSchemas()
while (schema.next()) {
val (key, value) = (schema.getString(1), schema.getString(2))
println(s"$key: $value")
}
val tables = metadata.getTables(null, null, null, null)
while (tables.next()) {
val fields = Array.tabulate(5) { i =>
tables.getString(i + 1)
}
println(fields.mkString(", "))
}
val columns = metadata.getColumns(null, null, null, null)
while (columns.next()) {
println((columns.getString(3), columns.getString(4),
columns.getString(6)))
}
}
}
}
{code}
> Spark hive jdbc client cannot get table from metadata store
> -----------------------------------------------------------
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
> Reporter: pin_zhang
> Assignee: Cheng Lian
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
> Class.forName("org.apache.hive.jdbc.HiveDriver");
> String URL = "jdbc:hive2://localhost:10000/default";
> Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
> ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
> null, null, null);
> Problem:
> No tables with returned this API, that work in spark1.3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]