partition

Yen Pai Wed, 17 Mar 2010 14:07:22 -0700

Also, I should note that I was using data stored in TEXTFILE format, so I
imagine that's why just copying the files into the partition folder worked.


I am pretty new to Hive myself but I would guess the correct way to do it
would be as Edward suggested, to use a LOAD statement:

For files that exist in the local filesystem:

LOAD DATA LOCAL INPATH `/tmp/datafile.txt` INTO TABLE mytable
PARTITION(dt='2010-03-16')


For files that exist in HDFS:

LOAD DATA INPATH '/user/data/datafile.txt' INTO TABLE mytable
PARTITION(dt='2010-03-16')


Let me know how things work out if you try it!

- Y



On Wed, Mar 17, 2010 at 1:57 PM, Ryan LeCompte <[email protected]> wrote:

> This is interesting... thanks for the response.
>
> My tables are not defined as "external" tables, however. I wonder if this
> would still work?
>
> Thanks,
> Ryan
>
>
>
> On Wed, Mar 17, 2010 at 4:46 PM, Yen Pai <[email protected]> wrote:
>
>> Hi Ryan,
>>
>> I was just experimenting with this recently and this is my experience with
>> "external" tables.  I would imagine regular tables work similarly.
>>
>> In Hive a partition is actually a folder in HDFS, so if you put another
>> file in the partition folder, formatted according to the original table
>> definition, you are in effect "appending" to the partition.
>>
>> For example, if your table exists as:
>> /user/hive/warehouse/mytable/
>>
>> And you have a partition folder:
>> /user/hive/warehouse/mytable/2010-03-16/
>>
>> With data files inside it:
>> /user/hive/warehouse/mytable/2010-03-16/data1
>> /user/hive/warehouse/mytable/2010-03-16/data2
>>
>> You can just put more files in the partition folder in HDFS (data3, data4,
>> etc.) and they will be recognized as part of the partition.
>>
>> - Yen
>>
>>
>>
>>
>> On Wed, Mar 17, 2010 at 1:05 PM, Ryan LeCompte <[email protected]>wrote:
>>
>>> Actually, I wasn't clear earlier... we are currently using this syntax
>>> for loading data into the table/partition:
>>>
>>> INSERT OVERWRITE TABLE ourtable PARTITION(dt='2010-03-16') ...
>>>
>>> If I execute this multiple times, I believe the data will simply be
>>> overwritten instead of appended, right?
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 17, 2010 at 4:01 PM, Ryan LeCompte <[email protected]>wrote:
>>>
>>>> Awesome! I didn't know this. :) I'll get it a shot, thanks!
>>>>
>>>>
>>>>
>>>> On Wed, Mar 17, 2010 at 3:57 PM, Edward Capriolo <[email protected]
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 17, 2010 at 3:30 PM, Ryan LeCompte <[email protected]>wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> Is it possible in Hive 0.5 to run multiple inserts into the same Hive
>>>>>> table/partition? Or is this not supported due to the fact that Hadoop
>>>>>> doesn't support appends properly?
>>>>>>
>>>>>> For example, it would be nice to periodically add new data every 5
>>>>>> minutes to a table that has a partition column for "date" via multiple
>>>>>> periodic INSERT statements.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>> Ryan,
>>>>>
>>>>> Every file inside the partition makes up the partiion. So with 'LOAD
>>>>> DATA INFILE (X)', if X is a unique name it will be "appended".
>>>>>
>>>>> This works for us since our 5 minute log files all have unique names .
>>>>>
>>>>> Edward
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Adding/appending data to existing table/partition

Reply via email to