Hi Johann,

Create external table with the 'location' clause set to ur data would be the 
way to go. However - Hive has it's own directory naming scheme for partitions 
('<partitition_key>=<partition_val>'). So just pointing to a directory with 
subdirectories would not work.

So right now case one would have to move or copy the data using the load 
command.

Going forward - one thing we can do is that for external tables - we can drop 
the 'key=val' directory naming for partitioned stuff and just assume that 
directory hierarchy follows the partition list and the directory names are 
partition values. Is that's what's required in this case?

Joydeep


-----Original Message-----
From: Johan Oskarsson [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 28, 2008 3:49 AM
To: [email protected]
Subject: External tables and existing directory structure

Hi, just had some fun with hive. Exciting stuff.

I have one question about mapping tables to our existing directory
structure. I assume the "CREATE EXTERNAL TABLE" would be the way to go,
but I haven't been able to find much information about how it works.

We have datasets in the following format in hdfs:
/dataset/yyyy/MM/dd/<one or more files>

I'd love to be able to bind these with the date as the partition to hive
tables without copying or moving the data. Is it currently possible?

/Johan

Reply via email to