Hive does not produce any results because the external table does not have any 
partitions defined. Partitions are not inferred from file system but are 
created when 'load data local...' or 'alter table ... add partition ..' 
commands are executed. So you need to execute the later cmd on the external 
table along with the load command on the regular table.

When Hive supports views or table aliases, this step will not needed but until 
then you need to do this extra step.

________________________________
From: Jason Michael <[email protected]>
Reply-To: <[email protected]>
Date: Wed, 29 Jul 2009 16:07:35 -0700
To: hive mailing list <[email protected]>
Subject: Create external table over top of existing, partitioned table

Hello,

I noticed the following behavior when trying to create an external table over 
top of an existing, partitioned table:

<snip>
hive> create table foo (id1 int, id2 int) partitioned by(p1 int)  ROW FORMAT 
DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored as 
textfile;
OK
Time taken: 0.12 seconds
hive> load data local inpath '/temp/foo1' overwrite into table foo 
partition(p1=1);
Copying data from file:/temp/foo1
Loading data to table foo partition {p1=1}
OK
Time taken: 1.493 seconds
hive> load data local inpath '/temp/foo2' overwrite into table foo 
partition(p1=2);
Copying data from file:/temp/foo2
Loading data to table foo partition {p1=2}
OK
Time taken: 1.218 seconds
hive> select * from foo;
OK
1    1    1
2    1    1
3    2    2
4    3    2
Time taken: 0.781 seconds
hive> create external table foo2 (id1 int, id2 int) partitioned by(p1 int)  ROW 
FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored 
as textfile location '/user/hive/warehouse/foo/';
OK
Time taken: 0.161 seconds
hive> select * from foo2;
OK
Time taken: 0.398 seconds
</snip>

I had hoped that the last select query would return the same results as the 
first one since, I thought, both tables would be pointing at the same data.  
What I'm attempting to do is create what amounts to a symlink or table alias in 
Hive.  In our warehouse, we have several large tables which get populated with 
different versions of an ETL tool.  We hoped to have, at any given time, a 
single table name that was just an alias for whichever of the large tables we 
considered 'live' at the time.  That way users of the system would only have to 
remember one set of table names to work with.  Is something like this possible 
in Hive?  We noted that the above approach worked fine with non-partitioned 
tables, presumably because the data files are located exactly where the 
location parameter has them rather than underneath a layer of partition subdirs.

Jason

Reply via email to