[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

Cheng Lian (JIRA) Tue, 17 Nov 2015 09:07:53 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009033#comment-15009033
 ]


Cheng Lian commented on SPARK-9686:
-----------------------------------

Tested 1.7-SNAPSHOT 
([fa13301|https://github.com/apache/spark/commit/fa13301ae440c4c9594280f236bcca11b62fdd29])
 under several different configurations using Hive 1.2.1 and a small JDBC 
testing program (attached at the end).

# Embedded metastore
  Remove {{conf/hive-site.xml}}, start Thrift server using 
{{./sbin/start-thriftserver.sh}}, and execute the test program
#- {{getSchemas()}} only returns {{default}}
#- {{getColumns}} returns nothing.
# Local metastore
  Configuring {{conf/hive-site.xml}} to point to a local PostgreSQL backed Hive 
1.2.1 metastore.
  Leave {{hive.metastore.uris}} empty (i.e. disabling remote metastore). Start 
Thrift server and execute the test program.
#- {{getSchemas()}} only returns {{default}}
#- {{getColumns}} returns nothing.
# Remote metastore
  Configuring {{conf/hive-site.xml}} to point to a remote PostgreSQL backed 
Hive 1.2.1 metastore.
  Set {{hive.metastore.uris}} to {{thrift://localhost:9083}}. Start metastore 
service using {{$HIVE_HOME/bin/hive --service metastore}}, start Thrift server, 
and execute the test program.
#- {{getSchemas()}} returns all defined databases.
#- {{getColumns}} returns columns defined in all tables.

However, it doesn't imply that using remote metastore works around this issue.  
After some investigation, I think there are two separate but related issues:
# In {{HiveThriftServer2}}, although all SQL commands are dispatched to 
metadata Hive client and execution Hive client properly, conventional JDBC 
calls are still using the default {{HiveServer2}} implementation (e.g. 
{{getSchemas()}} is handled by 
{{o.a.hive.service.cli.CLIService.getSchemas()}}). These calls are not 
dispatched and are always executed by execution Hive client, which points to 
the dummy local Derby metastore.
  We should override corresponding methods in {{SparkSQLCLIService}} and 
dispatch these JDBC calls to the metastore Hive client.
# When using remote metastore, execution Hive client somehow is initialized to 
point to the actual remote metastore instead of the dummy local Derby metastore.
  I haven't figured out the root cause, but single-step debugging shows that 
the execution Hive client does point to the remote metastore. My guess is that, 
{{hive.metastore.uris}} takes a high precedence than 
{{javax.jdo.option.ConnectionURL}}, and overrides the latter when a {{Hive}} 
object is being initialized.
  It's because of this issue that the 3rd test mentioned above shows the 
correct answer.  This issue can be steadily reproduced on my local machine. 
However, according to [~navis]'s comment, remote metastore didn't work for him 
either, probably because of other environmental factors.
  Filing a separate JIRA ticket for this one.

The JDBC testing program:

{code}
import java.sql.DriverManager

object JDBCExperiments {
  def main(args: Array[String]) {
    val url = "jdbc:hive2://localhost:10000/default"
    val username = "lian"
    val password = ""

    try {
      Class.forName("org.apache.hive.jdbc.HiveDriver")
      val connection = DriverManager.getConnection(url, username, password)
      val metadata = connection.getMetaData
      val schema = metadata.getSchemas()

      while (schema.next()) {
        val (key, value) = (schema.getString(1), schema.getString(2))
        println(s"$key: $value")
      }

      val tables = metadata.getTables(null, null, null, null)
      while (tables.next()) {
        val fields = Array.tabulate(5) { i =>
          tables.getString(i + 1)
        }
        println(fields.mkString(", "))
      }

      val columns = metadata.getColumns(null, null, null, null)
      while (columns.next()) {
        println((columns.getString(3), columns.getString(4), 
columns.getString(6)))
      }
    }
  }
}
{code}


> Spark hive jdbc client cannot get table from metadata store
> -----------------------------------------------------------
>
>                 Key: SPARK-9686
>                 URL: https://issues.apache.org/jira/browse/SPARK-9686
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1
>            Reporter: pin_zhang
>            Assignee: Cheng Lian
>         Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>       Class.forName("org.apache.hive.jdbc.HiveDriver");
>       String URL = "jdbc:hive2://localhost:10000/default";
>        Properties info = new Properties();
>         Connection conn = DriverManager.getConnection(URL, info);
>       ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>                null, null, null);
> Problem:
>            No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-9686) Spark hive jdbc client cannot get table from metadata store

Reply via email to