Andrew Olson created HIVE-18122:
-----------------------------------
Summary: HCatInputFormat cannot read any data when non-native
table has partition columns
Key: HIVE-18122
URL: https://issues.apache.org/jira/browse/HIVE-18122
Project: Hive
Issue Type: Bug
Components: HCatalog
Reporter: Andrew Olson
First, some background info: A non-native table can be created with partition
columns defined. However, the existence of partition columns for a non-native
table is problematic when using {{HCatInputFormat}}. Nothing disallows the
table creation, and the documentation [1] does not mention that non-native
tables cannot have partition columns. In fact, it suggests that "PARTITIONED
BY" can be specified.
With such a table definition, for any job using {{HCatInputFormat}} no data can
ever be read and the cause is not immediately obvious, only revealed via
debugging. The bug stems from the
{{org.apache.hive.hcatalog.mapreduce.InitializeInput}} class's logic in the
{{getInputJobInfo}} method, where it attempts to identify the partitions to
read. With partition columns defined, {{table.getPartitionKeys().size()}} is >
0 so it proceeds to the {{listPartitionsByFilter(...)}} code which will never
find any partitions, because partitions cannot be added to a non-native table
(HIVE-1223). The returned {{InputJobInfo}} then has an empty {{List<PartInfo>}}
set rather than taking the "Non partitioned table" path where the table's
{{StorageDescriptor}} and parameters are used to build a singleton {{PartInfo}}
to use.
This bug is quite similar to HIVE-18087 although it resides in a different
layer of Hive.
We encountered this using the {{HBaseStorageHandler}}, although I don't believe
that's a particularly relevant detail.
[1]
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-DDL
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)