Hi Johann,
Create external table with the 'location' clause set to ur data would be the
way to go. However - Hive has it's own directory naming scheme for partitions
('<partitition_key>=<partition_val>'). So just pointing to a directory with
subdirectories would not work.
So right now case one would have to move or copy the data using the load
command.
Going forward - one thing we can do is that for external tables - we can drop
the 'key=val' directory naming for partitioned stuff and just assume that
directory hierarchy follows the partition list and the directory names are
partition values. Is that's what's required in this case?
Joydeep
-----Original Message-----
From: Johan Oskarsson [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 3:49 AM
To: [email protected]
Subject: External tables and existing directory structure
Hi, just had some fun with hive. Exciting stuff.
I have one question about mapping tables to our existing directory
structure. I assume the "CREATE EXTERNAL TABLE" would be the way to go,
but I haven't been able to find much information about how it works.
We have datasets in the following format in hdfs:
/dataset/yyyy/MM/dd/<one or more files>
I'd love to be able to bind these with the date as the partition to hive
tables without copying or moving the data. Is it currently possible?
/Johan