I think this is a pretty common scenario as this is how I was storing
my stuff as well. Would this affect the HiveQL create table statement
at all or just implicitly require that it be ordered?
Josh
On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote:
Hi Johann,
Create external table with the 'location' clause set to ur data
would be the way to go. However - Hive has it's own directory naming
scheme for partitions ('<partitition_key>=<partition_val>'). So just
pointing to a directory with subdirectories would not work.
So right now case one would have to move or copy the data using the
load command.
Going forward - one thing we can do is that for external tables - we
can drop the 'key=val' directory naming for partitioned stuff and
just assume that directory hierarchy follows the partition list and
the directory names are partition values. Is that's what's required
in this case?
Joydeep
-----Original Message-----
From: Johan Oskarsson [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 3:49 AM
To: [email protected]
Subject: External tables and existing directory structure
Hi, just had some fun with hive. Exciting stuff.
I have one question about mapping tables to our existing directory
structure. I assume the "CREATE EXTERNAL TABLE" would be the way to
go,
but I haven't been able to find much information about how it works.
We have datasets in the following format in hdfs:
/dataset/yyyy/MM/dd/<one or more files>
I'd love to be able to bind these with the date as the partition to
hive
tables without copying or moving the data. Is it currently possible?
/Johan