Correct me if I'm wrong, but it looks like I'll need an INSERT statement for 
each (DS, TYPE) partition:

hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE 
PARTITION(ds='2009-12-07',type='user') select month, day, time, host, logline 
from syslog where syslog.ds='2009-12-07' and syslog.type='user';
Loading data to table syslog_sequence partition {ds=2009-12-07, type=user}
OK

Since I'm programmatically loading the data from files into the syslog table in 
the first place, I just need to add another statement to that program that adds 
the data to syslog_sequence every time it  adds a new partition to syslog.

From: [email protected] [mailto:[email protected]]
Sent: Wednesday, December 16, 2009 6:16 PM
To: [email protected]
Subject: How do I INSERT OVERWRITE into a new table if it's partitioned?

Hello,

I'd like to store my log file data that's imported into Hive in compressed 
format. I was following some steps outlined by Zheng on how to do that, where 
he says:

CREATE TABLE texttable (...) STORED AS TEXTFILE;
LOAD DATA ... OVERWRITE INTO texttable;
CREATE TABLE seqtable (...) STORED AS SEQUENCEFILE;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
set mapred.output.compression.type=BLOCK;
INSERT OVERWRITE TABLE seqtable SELECT * FROM texttable;

but I get stuck on the last step.

I can't write to my new SYSLOG_SEQUENCE  table because the tables are 
partitioned:

hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE  SELECT * FROM SYSLOG;
FAILED: Error in semantic analysis: need to specify partition columns because 
the destination table is partitioned.

What syntax can I use to get the data in the new table? It has DS and TYPE as 
partition columns:

hive> describe syslog;
OK
month   string  from deserializer
day     string  from deserializer
time    string  from deserializer
host    string  from deserializer
logline string  from deserializer
ds      string
type    string

I took a stab at it like below, but that only gives me two partitions in total, 
and of course what I want is the same partitions as exist in the original 
SYSLOG table.

Any pointers welcome-
Thanks
Ken

hive> INSERT OVERWRITE TABLE SYSLOG_SEQUENCE PARTITION(ds='*',type='*') select 
month, day, time, host, logline from syslog;
OK
Time taken: 94.885 seconds

Reply via email to