That's one possibility. Or we could have a 'format' spec in the create table command for how the directories are named. By default it's '%key=%value', but in this case it's '%value'. this might make it more flexible if we encounter other kinds of directory layouts.
Thoughts? (just remembered that there's probably an unfilled issue that drop table should not be deleting directories for external tables - but probably does right now ..) -----Original Message----- From: Josh Ferguson [mailto:[EMAIL PROTECTED] Sent: Friday, November 28, 2008 3:32 PM To: [email protected] Subject: Re: External tables and existing directory structure I think this is a pretty common scenario as this is how I was storing my stuff as well. Would this affect the HiveQL create table statement at all or just implicitly require that it be ordered? Josh On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote: > Hi Johann, > > Create external table with the 'location' clause set to ur data > would be the way to go. However - Hive has it's own directory naming > scheme for partitions ('<partitition_key>=<partition_val>'). So just > pointing to a directory with subdirectories would not work. > > So right now case one would have to move or copy the data using the > load command. > > Going forward - one thing we can do is that for external tables - we > can drop the 'key=val' directory naming for partitioned stuff and > just assume that directory hierarchy follows the partition list and > the directory names are partition values. Is that's what's required > in this case? > > Joydeep > > > -----Original Message----- > From: Johan Oskarsson [mailto:[EMAIL PROTECTED] > Sent: Friday, November 28, 2008 3:49 AM > To: [email protected] > Subject: External tables and existing directory structure > > Hi, just had some fun with hive. Exciting stuff. > > I have one question about mapping tables to our existing directory > structure. I assume the "CREATE EXTERNAL TABLE" would be the way to > go, > but I haven't been able to find much information about how it works. > > We have datasets in the following format in hdfs: > /dataset/yyyy/MM/dd/<one or more files> > > I'd love to be able to bind these with the date as the partition to > hive > tables without copying or moving the data. Is it currently possible? > > /Johan
