Correct me if I'm wrong, but it looks like I'll need an INSERT statement for
each (DS, TYPE) partition:
hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE
PARTITION(ds='2009-12-07',type='user') select month, day, time, host, logline
from syslog where syslog.ds='2009-12-07' and syslog.type='user';
Loading data to table syslog_sequence partition {ds=2009-12-07, type=user}
OK
Since I'm programmatically loading the data from files into the syslog table in
the first place, I just need to add another statement to that program that adds
the data to syslog_sequence every time it adds a new partition to syslog.
From: [email protected] [mailto:[email protected]]
Sent: Wednesday, December 16, 2009 6:16 PM
To: [email protected]
Subject: How do I INSERT OVERWRITE into a new table if it's partitioned?
Hello,
I'd like to store my log file data that's imported into Hive in compressed
format. I was following some steps outlined by Zheng on how to do that, where
he says:
CREATE TABLE texttable (...) STORED AS TEXTFILE;
LOAD DATA ... OVERWRITE INTO texttable;
CREATE TABLE seqtable (...) STORED AS SEQUENCEFILE;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
set mapred.output.compression.type=BLOCK;
INSERT OVERWRITE TABLE seqtable SELECT * FROM texttable;
but I get stuck on the last step.
I can't write to my new SYSLOG_SEQUENCE table because the tables are
partitioned:
hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE SELECT * FROM SYSLOG;
FAILED: Error in semantic analysis: need to specify partition columns because
the destination table is partitioned.
What syntax can I use to get the data in the new table? It has DS and TYPE as
partition columns:
hive> describe syslog;
OK
month string from deserializer
day string from deserializer
time string from deserializer
host string from deserializer
logline string from deserializer
ds string
type string
I took a stab at it like below, but that only gives me two partitions in total,
and of course what I want is the same partitions as exist in the original
SYSLOG table.
Any pointers welcome-
Thanks
Ken
hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE PARTITION(ds='*',type='*') select
month, day, time, host, logline from syslog;
OK
Time taken: 94.885 seconds