Venugopal Reddy K created HIVE-28145:
----------------------------------------
Summary: getPartitionsByNames API returns partition objects with
empty values in many fields when it is executed concurrently with dropPartition
API
Key: HIVE-28145
URL: https://issues.apache.org/jira/browse/HIVE-28145
Project: Hive
Issue Type: Bug
Reporter: Venugopal Reddy K
*Description:*
getPartitionsByNames API returns partition objects with empty values in many
fields when it is executed concurrently with dropPartition API.
org.apache.hadoop.hive.metastore.MetaStoreDirectSql#getPartitionsViaPartNames
method does multiple queries to backend db to populate the various fields in
the partition object. First it queries for part ids using partition names, then
joins PARTITIONS, SDS, SERDES tables for those part ids and creates partition
objects. Then another query to PARTITION_KEY_VALS table to get the partition
values for those part ids and populates in already created partition objects.
So if the partition is deleted just before PARTITION_KEY_VALS table query, it
can lead to empty values in partition object. This issue can happen for other
fields(like, partition params, storage descriptor params, serde params, sort
cols, bucket cols, skewed cols etc) too in partition object that require
queries to populate those fields.
*Note: Issue can be observed with both directsql and JDO based query. Need to
check for all APIs that involves multiple queries to backend database within a
transaction.*
*Root Cause:*
Transaction is opened with default isolation level(read-committed). The default
(in DataNucleus) is read-committed.
*Steps to reproduce:*
# Create a partitioned table and add 500~1000 dynamic partitions(can add dummy
partition param, sd param, serde param).
# Create a thread pool of size 2 and submit 2 tasks. One task to submit
getPartitionsByNames and another task to submit dropPartition in loop
# Verify the fields in partition objects returned from getPartitionsByNames().
--
This message was sent by Atlassian Jira
(v8.20.10#820010)