automatically infer existing partitions of table from HDFS files.
-----------------------------------------------------------------
Key: HIVE-493
URL: https://issues.apache.org/jira/browse/HIVE-493
Project: Hadoop Hive
Issue Type: New Feature
Components: Metastore, Query Processor
Affects Versions: 0.3.0, 0.3.1, 0.4.0
Reporter: Prasad Chakka
Initially partition list for a table is inferred from HDFS directory structure
instead of looking into metastore where partitions are created using 'alter
table ... add partition'. but this was removed to favor the metadata lookup
during metastore checker and also to facilitate external partitions.
Joydeep and Frederick mentioned that it would simple for users to create the
HDFS directory and let Hive infer rather than explicitly add a partition. But
doing that raises following...
1) External partitions -- so we have to mix both approaches and partition list
is merged list of inferred partitions and registered partitions. and duplicates
have to be resolved.
2) Partition level schemas can't supported. Which schema to chose for the
inferred partitions? the table schema when the inferred partition is created or
the latest tale schema? how do we know the table schema when the inferred
partitions is created?
3) If partitions have to be registered the partitions can be disabled without
actually deleting the data. this feature is not supported and may not be that
useful but nevertheless this can't be supported with inferred partitions
4) Indexes are being added. So if partitions are not registered then indexes
for such partitions can not be maintained automatically.
I would like to know what is the general thinking about this among users of
Hive. If inferred partitions are preferred then can we live with restricted
functionality that this imposes?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.