Thanks for the answer.

I like the idea of having a more flexible way of specifying how partition maps to directory structure. I'll see if I'll have some time to look at this, in the mean time I've filed a ticket for it: https://issues.apache.org/jira/browse/HIVE-91

Had a quick look at HIVE-86 (don't delete data) but am not quite sure what each component is doing. Is there an updated version of this wiki page anywhere? http://wiki.apache.org/hadoop/Hive/DeveloperGuide

If not could someone explain what HiveMetaStore* does compared to MetaStore*. One newer and one older version? And what does FileStore do compared to the above. Stores the meta db in files instead of a sql db?

There seems to be two different drop table methods as far as I can see. Are both used?
RWTable.drop()
HiveMetaStore.drop_table()

/Johan

Joydeep Sen Sarma wrote:
That's one possibility.

Or we could have a 'format' spec in the create table command for how the 
directories are named. By default it's '%key=%value', but in this case it's 
'%value'. this might make it more flexible if we encounter other kinds of 
directory layouts.

Thoughts?

(just remembered that there's probably an unfilled issue that drop table should 
not be deleting directories for external tables - but probably does right now 
..)

-----Original Message-----
From: Josh Ferguson [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2008 3:32 PM
To: [email protected]
Subject: Re: External tables and existing directory structure

I think this is a pretty common scenario as this is how I was storing my stuff as well. Would this affect the HiveQL create table statement at all or just implicitly require that it be ordered?

Josh

On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote:

Hi Johann,

Create external table with the 'location' clause set to ur data would be the way to go. However - Hive has it's own directory naming scheme for partitions ('<partitition_key>=<partition_val>'). So just pointing to a directory with subdirectories would not work.

So right now case one would have to move or copy the data using the load command.

Going forward - one thing we can do is that for external tables - we can drop the 'key=val' directory naming for partitioned stuff and just assume that directory hierarchy follows the partition list and the directory names are partition values. Is that's what's required in this case?

Joydeep


-----Original Message-----
From: Johan Oskarsson [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 3:49 AM
To: [email protected]
Subject: External tables and existing directory structure

Hi, just had some fun with hive. Exciting stuff.

I have one question about mapping tables to our existing directory
structure. I assume the "CREATE EXTERNAL TABLE" would be the way to go,
but I haven't been able to find much information about how it works.

We have datasets in the following format in hdfs:
/dataset/yyyy/MM/dd/<one or more files>

I'd love to be able to bind these with the date as the partition to hive
tables without copying or moving the data. Is it currently possible?

/Johan


Reply via email to