Andrew Olson created HIVE-18122:
-----------------------------------

             Summary: HCatInputFormat cannot read any data when non-native 
table has partition columns
                 Key: HIVE-18122
                 URL: https://issues.apache.org/jira/browse/HIVE-18122
             Project: Hive
          Issue Type: Bug
          Components: HCatalog
            Reporter: Andrew Olson


First, some background info: A non-native table can be created with partition 
columns defined. However, the existence of partition columns for a non-native 
table is problematic when using {{HCatInputFormat}}. Nothing disallows the 
table creation, and the documentation [1] does not mention that non-native 
tables cannot have partition columns. In fact, it suggests that "PARTITIONED 
BY" can be specified.

With such a table definition, for any job using {{HCatInputFormat}} no data can 
ever be read and the cause is not immediately obvious, only revealed via 
debugging. The bug stems from the 
{{org.apache.hive.hcatalog.mapreduce.InitializeInput}} class's logic in the 
{{getInputJobInfo}} method, where it attempts to identify the partitions to 
read. With partition columns defined, {{table.getPartitionKeys().size()}} is > 
0 so it proceeds to the {{listPartitionsByFilter(...)}} code which will never 
find any partitions, because partitions cannot be added to a non-native table 
(HIVE-1223). The returned {{InputJobInfo}} then has an empty {{List<PartInfo>}} 
set rather than taking the "Non partitioned table" path where the table's 
{{StorageDescriptor}} and parameters are used to build a singleton {{PartInfo}} 
to use.

This bug is quite similar to HIVE-18087 although it resides in a different 
layer of Hive.

We encountered this using the {{HBaseStorageHandler}}, although I don't believe 
that's a particularly relevant detail.

[1] 
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers#StorageHandlers-DDL



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to