[
https://issues.apache.org/jira/browse/KUDU-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17613067#comment-17613067
]
ASF subversion and git services commented on KUDU-3401:
-------------------------------------------------------
Commit 3eb4607e4e04044f92a2100b44574d730c6e05e6 in kudu's branch
refs/heads/master from mammadli.khazar
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3eb4607e4 ]
KUDU-3401 Fix table creation with HMS Integration
Hive queries on Kudu Tables were failing with the following stack trace:
ERROR : Failed
org.apache.hadoop.hive.metastore.api.MetaException:
java.lang.ClassNotFoundException Class not found
at
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
at
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331)
The issue was due to the Kudu HMS Client not sending the fields required
by Hive, namely the Input/Outputformat and Serialization library for the
created table when making a create table request. Thus, running queries through
Hive on Kudu tables would fail due to these fields missing in the HMS
Backend Database.
This patch adds the missing Input/Output formats and Serialization
library to table creation with Kudu HMS Integration.The patch also extends
the current test cases to cover the added fields. Manually tested on a
seperate cluster by creating a Kudu table with several columns via
"stored as kudu", confirmed the missing data is sent by checking the
parameters of the create_table request in Hive log files, and checked
that the data is written to the HMS Backend Database by going through the SDS
table for INPUT_FORMAT, OUTPUT_FORMAT and SERDES table for SLIB to see
if the data was filled for the newly created kudu table.
Ran a few Hive queries on the created Kudu tables and confirmed that no errors
are present.
Change-Id: Ia1b53b55005e2899d8575b0fb7250351d914afb4
Reviewed-on: http://gerrit.cloudera.org:8080/19026
Reviewed-by: Alexey Serbin <[email protected]>
Reviewed-by: Zoltan Chovan <[email protected]>
Tested-by: Attila Bukor <[email protected]>
Reviewed-by: Attila Bukor <[email protected]>
> Unable to query Kudu tables from Hive with Kudu HMS Integration enabled
> -----------------------------------------------------------------------
>
> Key: KUDU-3401
> URL: https://issues.apache.org/jira/browse/KUDU-3401
> Project: Kudu
> Issue Type: Bug
> Components: hms
> Reporter: Khazar Mammadli
> Assignee: Khazar Mammadli
> Priority: Major
>
> When Kudu HMS integration is enabled there are several missing fields when
> creating a table via query "stored as kudu table" on Impala from hive. This
> results in ClassNotFound error when trying to query the table from Hive after
> creating the table:
>
> {code:java}
> ERROR : Failed
> org.apache.hadoop.hive.metastore.api.MetaException:
> java.lang.ClassNotFoundException Class not found
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98)
> ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77)
> ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
> at
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331)
> ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] {code}
>
> When running a following sample query in Impala to create a kudu table with
> Kudu HMS integration enabled the table gets created with the InputFormat,
> OutputFormat and SerDe Library fields are missing
>
> {code:java}
> create table default.kudu_test (
> col1 string comment 'col1',
> col2 string comment 'col2',
> primary key (col1)
> )
> comment 'kudu_test'
> stored as kudu;{code}
>
> |SerDe Library:| |NULL|
> |InputFormat:| |NULL|
> |OutputFormat:| |NULL|
> Hive Metastore log for the table creation:
> INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-124]:
> 134: source:172.25.35.0 create_table: Table(tableName:kudu_test,
> dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1),
> FieldSchema(name:col2, type:string, comment:col2)], location:, inputFormat:,
> outputFormat:, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:,
> serializationLib:, parameters:{}), bucketCols:[], sortCols:[],
> parameters:{}), partitionKeys:[],
> parameters:{kudu.table_name=default.kudu_test,
> kudu.table_id=5ac46856863f402fb69941ce7b967945, comment=,
> kudu.master_addresses=c3549-node2.coelab.cloudera.com:7051,
> storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler,
> kudu.cluster_id=65c8dfbc8b75485db1328ab42f55fa07}, viewOriginalText:,
> viewExpandedText:, tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
> Running the same query in Impala with Kudu HMS Integration disabled on the
> other hand has these fields populated when the table is created:
> |SerDe Library:|org.apache.hadoop.hive.kudu.KuduSerDe|NULL|
> |InputFormat:|org.apache.hadoop.hive.kudu.KuduInputFormat|NULL|
> |OutputFormat:|org.apache.hadoop.hive.kudu.KuduOutputFormat|NULL|
> Hive Metastore log for table creation:
> NFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-173]:
> 183: source:172.25.35.0 create_table_req: Table(tableName:kudu_test,
> dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0,
> sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1),
> FieldSchema(name:col2, type:string, comment:col2)], location:null,
> inputFormat:org.apache.hadoop.hive.kudu.KuduInputFormat,
> outputFormat:org.apache.hadoop.hive.kudu.KuduOutputFormat, compressed:false,
> numBuckets:0, serdeInfo:SerDeInfo(name:null,
> serializationLib:org.apache.hadoop.hive.kudu.KuduSerDe, parameters:{}),
> bucketCols:[], sortCols:[], parameters:null), partitionKeys:[],
> parameters:{comment=kudu_test_lbodor_no_hms_integration,
> kudu.master_addresses=c3549-node2.coelab.cloudera.com,
> storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler,
> kudu.table_name=impala::default.kudu_test}, viewOriginalText:null,
> viewExpandedText:null, tableType:MANAGED_TABLE, catName:hive, ownerType:USER,
> accessType:8)
> --------------------------------
> Code path for table creation when Kudu HMS integration enabled(Kudu Codepath):
> Quick recap of steps when creating a kudu table:
> HMSCatalog::CreateTable() —> hive::Table declared and passed to
> PopulateTable(… , &table) -> Thirft client Execute call —>
> HMSClient::CreateTable(Table(one that just got populated),
> envcontext(default)) ->
> hms_client.create_table_with_environment_context(table, envcontext).
> CreateTable
> [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L146]
> ->
> Populate the fields of table
> [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L367]
> Hms client call
> [https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_client.cc#L280]
> -----------------------------
> Code path for table creation when Kudu HMS integration is disabled(Impala
> Codepath):
> CreateTable -> CreateMetaStoreTable
> [https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3191]
> ->line 3248 tbl.setSd(createSd(params));
> CreateSd
> [https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3260|https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java#L36]
>
> Checking the code paths its observable that the missing fields are filled via
> CreateSd with default values for the table getting created without Kudu HMS
> integration(Through Impala).
> These fields are untouched when Kudu HMS integration is enabled and table is
> getting created(Kudu code path).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)