[jira] Updated: (HIVE-493) automatically infer existing partitions of table from HDFS files.

Prasad Chakka (JIRA) Sun, 17 May 2009 18:12:11 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Prasad Chakka updated HIVE-493:
-------------------------------

    Description: 
Initially partition list for a table is inferred from HDFS directory structure 
instead of looking into metastore (partitions are created using 'alter table 
... add partition'). but this automatic inferring was removed to favor the 
later approach during checking-in metastore checker feature and also to 
facilitate external partitions.

Joydeep and Frederick mentioned that it would simple for users to create the 
HDFS directory and let Hive infer rather than explicitly add a partition. But 
doing that raises following...

1) External partitions -- so we have to mix both approaches and partition list 
is merged list of inferred partitions and registered partitions. and duplicates 
have to be resolved.
2) Partition level schemas can't supported. Which schema to chose for the 
inferred partitions? the table schema when the inferred partition is created or 
the latest tale schema? how do we know the table schema when the inferred 
partitions is created?
3) If partitions have to be registered the partitions can be disabled without 
actually deleting the data. this feature is not supported and may not be that 
useful but nevertheless this can't be supported with inferred partitions
4) Indexes are being added. So if partitions are not registered then indexes 
for such partitions can not be maintained automatically.

I would like to know what is the general thinking about this among users of 
Hive. If inferred partitions are preferred then can we live with restricted 
functionality that this imposes?

  was:
Initially partition list for a table is inferred from HDFS directory structure 
instead of looking into metastore where partitions are created using 'alter 
table ... add partition'. but this was removed to favor the metadata lookup 
during metastore checker and also to facilitate external partitions.

Joydeep and Frederick mentioned that it would simple for users to create the 
HDFS directory and let Hive infer rather than explicitly add a partition. But 
doing that raises following...

1) External partitions -- so we have to mix both approaches and partition list 
is merged list of inferred partitions and registered partitions. and duplicates 
have to be resolved.
2) Partition level schemas can't supported. Which schema to chose for the 
inferred partitions? the table schema when the inferred partition is created or 
the latest tale schema? how do we know the table schema when the inferred 
partitions is created?
3) If partitions have to be registered the partitions can be disabled without 
actually deleting the data. this feature is not supported and may not be that 
useful but nevertheless this can't be supported with inferred partitions
4) Indexes are being added. So if partitions are not registered then indexes 
for such partitions can not be maintained automatically.

I would like to know what is the general thinking about this among users of 
Hive. If inferred partitions are preferred then can we live with restricted 
functionality that this imposes?


> automatically infer existing partitions of table from HDFS files.
> -----------------------------------------------------------------
>
>                 Key: HIVE-493
>                 URL: https://issues.apache.org/jira/browse/HIVE-493
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore, Query Processor
>    Affects Versions: 0.3.0, 0.3.1, 0.4.0
>            Reporter: Prasad Chakka
>
> Initially partition list for a table is inferred from HDFS directory 
> structure instead of looking into metastore (partitions are created using 
> 'alter table ... add partition'). but this automatic inferring was removed to 
> favor the later approach during checking-in metastore checker feature and 
> also to facilitate external partitions.
> Joydeep and Frederick mentioned that it would simple for users to create the 
> HDFS directory and let Hive infer rather than explicitly add a partition. But 
> doing that raises following...
> 1) External partitions -- so we have to mix both approaches and partition 
> list is merged list of inferred partitions and registered partitions. and 
> duplicates have to be resolved.
> 2) Partition level schemas can't supported. Which schema to chose for the 
> inferred partitions? the table schema when the inferred partition is created 
> or the latest tale schema? how do we know the table schema when the inferred 
> partitions is created?
> 3) If partitions have to be registered the partitions can be disabled without 
> actually deleting the data. this feature is not supported and may not be that 
> useful but nevertheless this can't be supported with inferred partitions
> 4) Indexes are being added. So if partitions are not registered then indexes 
> for such partitions can not be maintained automatically.
> I would like to know what is the general thinking about this among users of 
> Hive. If inferred partitions are preferred then can we live with restricted 
> functionality that this imposes?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-493) automatically infer existing partitions of table from HDFS files.

Reply via email to