Hi Tim! Thanks for the quick response. I ended up creating a partition for day and hour, but that slowed down my query times a lot for Hive. (It took like 2 minutes just to post to the job scheduler). I think daily will work. I hate to just keep rewriting today's data in the partition over and over again. If I end up doing something else, I'll make sure to post it.
Thanks, Christian On Wed, Mar 27, 2013 at 3:29 PM, Timothy Potter <thelabd...@gmail.com>wrote: > Hi Christian, > > We do something similar but there's no append to an existing partition > afaik - I'm surprised it's not failing to write the new when it already > exists. We use a more granular partition scheme or re-write the entire > partition each time. > > Cheers, > Tim > > > On Wed, Mar 27, 2013 at 3:07 PM, Christian <engr...@gmail.com> wrote: > >> Hi, >> >> I am trying to run a pig job every few minutes that should end up using >> HCat's automatic partitioning to store the data in the correct directory >> (/apps/hive/warehouse/ntp_hcat/request_date=2013-03-27/) >> >> I've set the partition column and I can successfully write data and it >> goes to the correct place. The problem I am having is that every time I run >> the job, it is deleting the existing data in the directory (partition). >> >> My store call is simply: >> >> STORE complete INTO 'ntp_hcat' USING org.apache.hcatalog.pig.HCatStorer(); >> >> My table definition in Hive is: >> >> CREATE TABLE ntp_hcat( >> year INT, >> month INT, >> day INT, >> date_time STRING, >> hour INT, >> minute INT, >> second INT, >> seconds_in_day BIGINT, >> ip STRING, >> method STRING, >> path STRING, >> original_path STRING, >> is_static_resource STRING, >> is_page STRING, >> status INT, >> referrer_host STRING, >> referrer STRING, >> original_referrer STRING, >> agent STRING, >> content_length BIGINT, >> response_time FLOAT, >> web_server STRING, >> app_server STRING, >> session_id STRING, >> sold_to_party_num STRING, >> customer_name STRING, >> login_id STRING, >> employee_id STRING, >> first_name STRING, >> last_name STRING, >> session_start_date STRING, >> browser STRING, >> browser_version STRING, >> is_slow_response STRING) >> COMMENT 'This is the ntp apache requests table' >> partitioned by (request_date string) >> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' >> STORED AS TEXTFILE; >> >> I am using HDP 1.2.1. What am I doing wrong? >> >> Thank you, >> Christian >> > >