Re: External tables and existing directory structure

Johan Oskarsson Mon, 01 Dec 2008 09:48:49 -0800

Thanks for the answer.

I like the idea of having a more flexible way of specifying howpartition maps to directory structure.I'll see if I'll have some time to look at this, in the mean time I'vefiled a ticket for it: https://issues.apache.org/jira/browse/HIVE-91

Had a quick look at HIVE-86 (don't delete data) but am not quite surewhat each component is doing.Is there an updated version of this wiki page anywhere?http://wiki.apache.org/hadoop/Hive/DeveloperGuide

If not could someone explain what HiveMetaStore* does compared toMetaStore*. One newer and one older version?And what does FileStore do compared to the above. Stores the meta db infiles instead of a sql db?

There seems to be two different drop table methods as far as I can see.Are both used?

RWTable.drop()
HiveMetaStore.drop_table()

/Johan

Joydeep Sen Sarma wrote:

That's one possibility.

Or we could have a 'format' spec in the create table command for how the 
directories are named. By default it's '%key=%value', but in this case it's 
'%value'. this might make it more flexible if we encounter other kinds of 
directory layouts.

Thoughts?

(just remembered that there's probably an unfilled issue that drop table should 
not be deleting directories for external tables - but probably does right now 
..)

-----Original Message-----
From: Josh Ferguson [mailto:[EMAIL PROTECTED]Sent: Friday, November 28, 2008 3:32 PM
To: [email protected]
Subject: Re: External tables and existing directory structure
I think this is a pretty common scenario as this is how I was storingmy stuff as well. Would this affect the HiveQL create table statementat all or just implicitly require that it be ordered?
Josh

On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote:
Hi Johann,
Create external table with the 'location' clause set to ur datawould be the way to go. However - Hive has it's own directory namingscheme for partitions ('<partitition_key>=<partition_val>'). So justpointing to a directory with subdirectories would not work.
So right now case one would have to move or copy the data using theload command.
Going forward - one thing we can do is that for external tables - wecan drop the 'key=val' directory naming for partitioned stuff andjust assume that directory hierarchy follows the partition list andthe directory names are partition values. Is that's what's requiredin this case?
Joydeep


-----Original Message-----
From: Johan Oskarsson [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 3:49 AM
To: [email protected]
Subject: External tables and existing directory structure

Hi, just had some fun with hive. Exciting stuff.

I have one question about mapping tables to our existing directory
structure. I assume the "CREATE EXTERNAL TABLE" would be the way togo,
but I haven't been able to find much information about how it works.

We have datasets in the following format in hdfs:
/dataset/yyyy/MM/dd/<one or more files>
I'd love to be able to bind these with the date as the partition tohive
tables without copying or moving the data. Is it currently possible?

/Johan

Re: External tables and existing directory structure

Reply via email to