I think this is a pretty common scenario as this is how I was storing my stuff as well. Would this affect the HiveQL create table statement at all or just implicitly require that it be ordered?

Josh

On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote:

Hi Johann,

Create external table with the 'location' clause set to ur data would be the way to go. However - Hive has it's own directory naming scheme for partitions ('<partitition_key>=<partition_val>'). So just pointing to a directory with subdirectories would not work.

So right now case one would have to move or copy the data using the load command.

Going forward - one thing we can do is that for external tables - we can drop the 'key=val' directory naming for partitioned stuff and just assume that directory hierarchy follows the partition list and the directory names are partition values. Is that's what's required in this case?

Joydeep


-----Original Message-----
From: Johan Oskarsson [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 3:49 AM
To: [email protected]
Subject: External tables and existing directory structure

Hi, just had some fun with hive. Exciting stuff.

I have one question about mapping tables to our existing directory
structure. I assume the "CREATE EXTERNAL TABLE" would be the way to go,
but I haven't been able to find much information about how it works.

We have datasets in the following format in hdfs:
/dataset/yyyy/MM/dd/<one or more files>

I'd love to be able to bind these with the date as the partition to hive
tables without copying or moving the data. Is it currently possible?

/Johan

Reply via email to