Andrew Olson created HIVE-18122: ----------------------------------- Summary: HCatInputFormat cannot read any data when non-native table has partition columns Key: HIVE-18122 URL: https://issues.apache.org/jira/browse/HIVE-18122 Project: Hive Issue Type: Bug Components: HCatalog Reporter: Andrew Olson
First, some background info: A non-native table can be created with partition columns defined. However, the existence of partition columns for a non-native table is problematic when using {{HCatInputFormat}}. Nothing disallows the table creation, and the documentation [1] does not mention that non-native tables cannot have partition columns. In fact, it suggests that "PARTITIONED BY" can be specified. With such a table definition, for any job using {{HCatInputFormat}} no data can ever be read and the cause is not immediately obvious, only revealed via debugging. The bug stems from the {{org.apache.hive.hcatalog.mapreduce.InitializeInput}} class's logic in the {{getInputJobInfo}} method, where it attempts to identify the partitions to read. With partition columns defined, {{table.getPartitionKeys().size()}} is > 0 so it proceeds to the {{listPartitionsByFilter(...)}} code which will never find any partitions, because partitions cannot be added to a non-native table (HIVE-1223). The returned {{InputJobInfo}} then has an empty {{List<PartInfo>}} set rather than taking the "Non partitioned table" path where the table's {{StorageDescriptor}} and parameters are used to build a singleton {{PartInfo}} to use. This bug is quite similar to HIVE-18087 although it resides in a different layer of Hive. We encountered this using the {{HBaseStorageHandler}}, although I don't believe that's a particularly relevant detail. [1] https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-DDL -- This message was sent by Atlassian JIRA (v6.4.14#64029)