Khazar Mammadli created KUDU-3401:
-------------------------------------
Summary: Unable to query Kudu tables from Hive with Kudu HMS
Integration enabled
Key: KUDU-3401
URL: https://issues.apache.org/jira/browse/KUDU-3401
Project: Kudu
Issue Type: Bug
Components: hms
Reporter: Khazar Mammadli
Assignee: Khazar Mammadli
When Kudu HMS integration is enabled there are several missing fields when
creating a table via query "stored as kudu table" on Impala from hive. This
results in ClassNotFound error when trying to query the table from Hive after
creating the table:
{code:java}
ERROR : Failed
org.apache.hadoop.hive.metastore.api.MetaException:
java.lang.ClassNotFoundException Class not found
at
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98)
~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331)
~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] {code}
When running a following sample query in Impala to create a kudu table with
Kudu HMS integration enabled the table gets created with the InputFormat,
OutputFormat and SerDe Library fields are missing
{code:java}
create table default.kudu_test (
col1 string comment 'col1',
col2 string comment 'col2',
primary key (col1)
)
comment 'kudu_test'
stored as kudu;{code}
|SerDe Library:| |NULL|
|InputFormat:| |NULL|
|OutputFormat:| |NULL|
Hive Metastore log for the table creation:
INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-124]: 134:
source:172.25.35.0 create_table: Table(tableName:kudu_test, dbName:default,
owner:root, createTime:0, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1),
FieldSchema(name:col2, type:string, comment:col2)], location:, inputFormat:,
outputFormat:, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:,
serializationLib:, parameters:{}), bucketCols:[], sortCols:[], parameters:{}),
partitionKeys:[], parameters:{kudu.table_name=default.kudu_test,
kudu.table_id=5ac46856863f402fb69941ce7b967945, comment=,
kudu.master_addresses=c3549-node2.coelab.cloudera.com:7051,
storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler,
kudu.cluster_id=65c8dfbc8b75485db1328ab42f55fa07}, viewOriginalText:,
viewExpandedText:, tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
Running the same query in Impala with Kudu HMS Integration disabled on the
other hand has these fields populated when the table is created:
|SerDe Library:|org.apache.hadoop.hive.kudu.KuduSerDe|NULL|
|InputFormat:|org.apache.hadoop.hive.kudu.KuduInputFormat|NULL|
|OutputFormat:|org.apache.hadoop.hive.kudu.KuduOutputFormat|NULL|
Hive Metastore log for table creation:
NFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-173]: 183:
source:172.25.35.0 create_table_req: Table(tableName:kudu_test, dbName:default,
owner:root, createTime:0, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1),
FieldSchema(name:col2, type:string, comment:col2)], location:null,
inputFormat:org.apache.hadoop.hive.kudu.KuduInputFormat,
outputFormat:org.apache.hadoop.hive.kudu.KuduOutputFormat, compressed:false,
numBuckets:0, serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.kudu.KuduSerDe, parameters:{}),
bucketCols:[], sortCols:[], parameters:null), partitionKeys:[],
parameters:{comment=kudu_test_lbodor_no_hms_integration,
kudu.master_addresses=c3549-node2.coelab.cloudera.com,
storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler,
kudu.table_name=impala::default.kudu_test}, viewOriginalText:null,
viewExpandedText:null, tableType:MANAGED_TABLE, catName:hive, ownerType:USER,
accessType:8)
--------------------------------
Code path for table creation when Kudu HMS integration enabled(Kudu Codepath):
Quick recap of steps when creating a kudu table:
HMSCatalog::CreateTable() —> hive::Table declared and passed to PopulateTable(…
, &table) -> Thirft client Execute call —> HMSClient::CreateTable(Table(one
that just got populated), envcontext(default)) ->
hms_client.create_table_with_environment_context(table, envcontext).
CreateTable
[https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L146] ->
Populate the fields of table
[https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L367]
Hms client call
[https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_client.cc#L280]
-----------------------------
Code path for table creation when Kudu HMS integration is disabled(Impala
Codepath):
CreateTable -> CreateMetaStoreTable
[https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3191]
->line 3248 tbl.setSd(createSd(params));
CreateSd
[https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3260|https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java#L36]
Checking the code paths its observable that the missing fields are filled via
CreateSd with default values for the table getting created without Kudu HMS
integration(Through Impala).
These fields are untouched when Kudu HMS integration is enabled and table is
getting created(Kudu code path).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)