Joseph Barefoot created DRILL-3588:
--------------------------------------
Summary: Write back to Hive Metastore
Key: DRILL-3588
URL: https://issues.apache.org/jira/browse/DRILL-3588
Project: Apache Drill
Issue Type: Improvement
Reporter: Joseph Barefoot
Priority: Critical
This feature is particularly important to us here at AtScale in order to
leverage Drill as a query engine option for our BI on Hadoop solution.
Currently you can connect to and query databases/tables from Hive Metastore
fine. However if you create a table, it will be created in HDFS but no metadata
is written to the Hive Metastore. That means those tables won't be easily
visible to any other tool.
When you read schemas from a Hive datasource via Drill, they are prefixed with
"hive.". This namespacing makes sense to us considering how Drill works, and
ideally it would work symmetrically when you create tables with the same
prefix, i.e. Drill would map the prefix to the target data source, in this case
Hive, and write the schema information back to the Hive MetaStore. Our specific
use case is Create Table As Select, however ideally any DDL statements against
a hive datasource schema/table would write back to the Hive Metastore.
The reason it's important to have the metadata in Hive Metastore is we have
found many of our customers use multiple SQL tools to access data tracked in
the Metastore. For example, even if Impala is their primary SQL on Hadoop
engine for clients/tools, they may run Spark jobs to manipulate data via RDDs
that pull data by referencing the Metastore. Organizations using a lot of SQL
on Hadoop have come to expect this sort of interoperability between Hive,
Spark, and Impala, and supporting it within Drill will help drive adoption
within the Hadoop community (besides making it a lot easier for us to use Drill
effectively from within our BI engine).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)