Hello,
I noticed the following behavior when trying to create an external table over
top of an existing, partitioned table:
<snip>
hive> create table foo (id1 int, id2 int) partitioned by(p1 int) ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored as
textfile;
OK
Time taken: 0.12 seconds
hive> load data local inpath '/temp/foo1' overwrite into table foo
partition(p1=1);
Copying data from file:/temp/foo1
Loading data to table foo partition {p1=1}
OK
Time taken: 1.493 seconds
hive> load data local inpath '/temp/foo2' overwrite into table foo
partition(p1=2);
Copying data from file:/temp/foo2
Loading data to table foo partition {p1=2}
OK
Time taken: 1.218 seconds
hive> select * from foo;
OK
1 1 1
2 1 1
3 2 2
4 3 2
Time taken: 0.781 seconds
hive> create external table foo2 (id1 int, id2 int) partitioned by(p1 int) ROW
FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored
as textfile location '/user/hive/warehouse/foo/';
OK
Time taken: 0.161 seconds
hive> select * from foo2;
OK
Time taken: 0.398 seconds
</snip>
I had hoped that the last select query would return the same results as the
first one since, I thought, both tables would be pointing at the same data.
What I'm attempting to do is create what amounts to a symlink or table alias in
Hive. In our warehouse, we have several large tables which get populated with
different versions of an ETL tool. We hoped to have, at any given time, a
single table name that was just an alias for whichever of the large tables we
considered 'live' at the time. That way users of the system would only have to
remember one set of table names to work with. Is something like this possible
in Hive? We noted that the above approach worked fine with non-partitioned
tables, presumably because the data files are located exactly where the
location parameter has them rather than underneath a layer of partition subdirs.
Jason