[ 
https://issues.apache.org/jira/browse/HIVE-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655354#action_12655354
 ] 

Joydeep Sen Sarma commented on HIVE-91:
---------------------------------------

one thing is not clear to me:

when an external table is created pointing to a location - do the 
subdirectories automatically get registered into corresponding partitions in 
Hive? similarly - when new subdirectories are added - what happens - does Hive 
recognize them automatically. (the only other alternative would be to call 
'load data ...'  where the data directory and the target directory will be the 
same  - which would probably work - but i don't think we have tried it out).

(this is kind of relevant to hive-126 - since we are getting rid of the logic 
that recognizes partitions based on hdfs contents).

this seems like a usability issue. if the directories already exist and you are 
unwilling to alter them (so that hive can convert it into internal table 
directory structure) - then i presume that there are other apps that work 
directly against the directory namespace - and perhaps there is already a 
pipeline to populate these directories on an ongoing basis. this would suggest 
that hive should just learn about partitions from the hdfs namespace - rather 
than burden those pipelines to call 'load data' and 'drop partition' on subdir 
creation/deletion.

comments?


> Allow external tables with different partition directory structure
> ------------------------------------------------------------------
>
>                 Key: HIVE-91
>                 URL: https://issues.apache.org/jira/browse/HIVE-91
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Johan Oskarsson
>            Assignee: Johan Oskarsson
>            Priority: Minor
>
> A lot of users have datasets in a directory structures similar to this in 
> hdfs: /dataset/yyyy/MM/dd/<one or more files>
> Instead of loading these into Hive the normal way it would be useful to 
> create an external table with the /dataset location and then one partition 
> per yyyy/mm/dd. This would require the partition "naming to 
> directory"-function to be made more flexible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to