Also, I should note that I was using data stored in TEXTFILE format, so I imagine that's why just copying the files into the partition folder worked.
I am pretty new to Hive myself but I would guess the correct way to do it would be as Edward suggested, to use a LOAD statement: For files that exist in the local filesystem: LOAD DATA LOCAL INPATH `/tmp/datafile.txt` INTO TABLE mytable PARTITION(dt='2010-03-16') For files that exist in HDFS: LOAD DATA INPATH '/user/data/datafile.txt' INTO TABLE mytable PARTITION(dt='2010-03-16') Let me know how things work out if you try it! - Y On Wed, Mar 17, 2010 at 1:57 PM, Ryan LeCompte <[email protected]> wrote: > This is interesting... thanks for the response. > > My tables are not defined as "external" tables, however. I wonder if this > would still work? > > Thanks, > Ryan > > > > On Wed, Mar 17, 2010 at 4:46 PM, Yen Pai <[email protected]> wrote: > >> Hi Ryan, >> >> I was just experimenting with this recently and this is my experience with >> "external" tables. I would imagine regular tables work similarly. >> >> In Hive a partition is actually a folder in HDFS, so if you put another >> file in the partition folder, formatted according to the original table >> definition, you are in effect "appending" to the partition. >> >> For example, if your table exists as: >> /user/hive/warehouse/mytable/ >> >> And you have a partition folder: >> /user/hive/warehouse/mytable/2010-03-16/ >> >> With data files inside it: >> /user/hive/warehouse/mytable/2010-03-16/data1 >> /user/hive/warehouse/mytable/2010-03-16/data2 >> >> You can just put more files in the partition folder in HDFS (data3, data4, >> etc.) and they will be recognized as part of the partition. >> >> - Yen >> >> >> >> >> On Wed, Mar 17, 2010 at 1:05 PM, Ryan LeCompte <[email protected]>wrote: >> >>> Actually, I wasn't clear earlier... we are currently using this syntax >>> for loading data into the table/partition: >>> >>> INSERT OVERWRITE TABLE ourtable PARTITION(dt='2010-03-16') ... >>> >>> If I execute this multiple times, I believe the data will simply be >>> overwritten instead of appended, right? >>> >>> >>> >>> >>> >>> >>> On Wed, Mar 17, 2010 at 4:01 PM, Ryan LeCompte <[email protected]>wrote: >>> >>>> Awesome! I didn't know this. :) I'll get it a shot, thanks! >>>> >>>> >>>> >>>> On Wed, Mar 17, 2010 at 3:57 PM, Edward Capriolo <[email protected] >>>> > wrote: >>>> >>>>> >>>>> >>>>> On Wed, Mar 17, 2010 at 3:30 PM, Ryan LeCompte <[email protected]>wrote: >>>>> >>>>>> Hello all, >>>>>> >>>>>> Is it possible in Hive 0.5 to run multiple inserts into the same Hive >>>>>> table/partition? Or is this not supported due to the fact that Hadoop >>>>>> doesn't support appends properly? >>>>>> >>>>>> For example, it would be nice to periodically add new data every 5 >>>>>> minutes to a table that has a partition column for "date" via multiple >>>>>> periodic INSERT statements. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Ryan >>>>>> >>>>>> Ryan, >>>>> >>>>> Every file inside the partition makes up the partiion. So with 'LOAD >>>>> DATA INFILE (X)', if X is a unique name it will be "appended". >>>>> >>>>> This works for us since our 5 minute log files all have unique names . >>>>> >>>>> Edward >>>>> >>>> >>>> >>> >> >
