[
https://issues.apache.org/jira/browse/HIVE-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655354#action_12655354
]
Joydeep Sen Sarma commented on HIVE-91:
---------------------------------------
one thing is not clear to me:
when an external table is created pointing to a location - do the
subdirectories automatically get registered into corresponding partitions in
Hive? similarly - when new subdirectories are added - what happens - does Hive
recognize them automatically. (the only other alternative would be to call
'load data ...' where the data directory and the target directory will be the
same - which would probably work - but i don't think we have tried it out).
(this is kind of relevant to hive-126 - since we are getting rid of the logic
that recognizes partitions based on hdfs contents).
this seems like a usability issue. if the directories already exist and you are
unwilling to alter them (so that hive can convert it into internal table
directory structure) - then i presume that there are other apps that work
directly against the directory namespace - and perhaps there is already a
pipeline to populate these directories on an ongoing basis. this would suggest
that hive should just learn about partitions from the hdfs namespace - rather
than burden those pipelines to call 'load data' and 'drop partition' on subdir
creation/deletion.
comments?
> Allow external tables with different partition directory structure
> ------------------------------------------------------------------
>
> Key: HIVE-91
> URL: https://issues.apache.org/jira/browse/HIVE-91
> Project: Hadoop Hive
> Issue Type: Improvement
> Components: Metastore
> Reporter: Johan Oskarsson
> Assignee: Johan Oskarsson
> Priority: Minor
>
> A lot of users have datasets in a directory structures similar to this in
> hdfs: /dataset/yyyy/MM/dd/<one or more files>
> Instead of loading these into Hive the normal way it would be useful to
> create an external table with the /dataset location and then one partition
> per yyyy/mm/dd. This would require the partition "naming to
> directory"-function to be made more flexible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.