Thanks for the answer.
I like the idea of having a more flexible way of specifying how
partition maps to directory structure.
I'll see if I'll have some time to look at this, in the mean time I've
filed a ticket for it: https://issues.apache.org/jira/browse/HIVE-91
Had a quick look at HIVE-86 (don't delete data) but am not quite sure
what each component is doing.
Is there an updated version of this wiki page anywhere?
http://wiki.apache.org/hadoop/Hive/DeveloperGuide
If not could someone explain what HiveMetaStore* does compared to
MetaStore*. One newer and one older version?
And what does FileStore do compared to the above. Stores the meta db in
files instead of a sql db?
There seems to be two different drop table methods as far as I can see.
Are both used?
RWTable.drop()
HiveMetaStore.drop_table()
/Johan
Joydeep Sen Sarma wrote:
That's one possibility.
Or we could have a 'format' spec in the create table command for how the
directories are named. By default it's '%key=%value', but in this case it's
'%value'. this might make it more flexible if we encounter other kinds of
directory layouts.
Thoughts?
(just remembered that there's probably an unfilled issue that drop table should
not be deleting directories for external tables - but probably does right now
..)
-----Original Message-----
From: Josh Ferguson [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 3:32 PM
To: [email protected]
Subject: Re: External tables and existing directory structure
I think this is a pretty common scenario as this is how I was storing
my stuff as well. Would this affect the HiveQL create table statement
at all or just implicitly require that it be ordered?
Josh
On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote:
Hi Johann,
Create external table with the 'location' clause set to ur data
would be the way to go. However - Hive has it's own directory naming
scheme for partitions ('<partitition_key>=<partition_val>'). So just
pointing to a directory with subdirectories would not work.
So right now case one would have to move or copy the data using the
load command.
Going forward - one thing we can do is that for external tables - we
can drop the 'key=val' directory naming for partitioned stuff and
just assume that directory hierarchy follows the partition list and
the directory names are partition values. Is that's what's required
in this case?
Joydeep
-----Original Message-----
From: Johan Oskarsson [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2008 3:49 AM
To: [email protected]
Subject: External tables and existing directory structure
Hi, just had some fun with hive. Exciting stuff.
I have one question about mapping tables to our existing directory
structure. I assume the "CREATE EXTERNAL TABLE" would be the way to
go,
but I haven't been able to find much information about how it works.
We have datasets in the following format in hdfs:
/dataset/yyyy/MM/dd/<one or more files>
I'd love to be able to bind these with the date as the partition to
hive
tables without copying or moving the data. Is it currently possible?
/Johan