I've got it , thank you.!

On Thu, Jan 8, 2009 at 9:17 PM, Jeremy Chow <[email protected]> wrote:

> Is that the same meaning of hash partition?
>
>
> On Thu, Jan 8, 2009 at 4:52 PM, Jeff Hammerbacher <[email protected]>wrote:
>
>> Hey Jeremy,
>>
>> Hive stores each "table" inside of HDFS in a folder. For example, all of
>> your weblogs could be stored in a folder called "/hive/weblogs". If you want
>> to partition those weblogs by day, you can use the PARTITIONED BY clause on
>> the CREATE TABLE statement to create a subfolder for each new day, e.g.
>> "/hive/weblogs/ds=2009-01-08". If you wanted to further partition a day's
>> logfiles by userid, for example, Hive can hash partition your logfiles into
>> "buckets" (subfolders) inside that day's folder, e.g.
>> "/hive/weblogs/ds=2009-01-08/0001", where 0001 is the name of the bucket. To
>> indicate your desire to have buckets, use the CLUSTERED BY clause on the
>> CREATE TABLE statement (see
>> http://wiki.apache.org/hadoop/Hive/HiveQL#head-6fb42f2747383d4375e56cc31bbae68860c88a3d
>> ).
>>
>> You can also use buckets with the TABLESAMPLE operator to run Hive queries
>> over subsets of your data; this is useful for rapidly prototyping new
>> analyses. See
>> http://wiki.apache.org/hadoop/Hive/HiveQL#head-c7c5e4391816048d290eb70091487b4f91beebc9for
>>  the TABLESAMPLE syntax.
>>
>> Hive folks: in case I butchered that, feel free to jump in with a more
>> correct explanation. If it's correct, I'll toss it on the wiki. It would be
>> good to have actual HiveQL statements using buckets on the getting started
>> guide too, I'd imagine.
>>
>> Later,
>> Jeff
>>
>>
>> On Thu, Jan 8, 2009 at 12:21 AM, Jeremy Chow <[email protected]> wrote:
>>
>>> Hi list,
>>>
>>> I get a term named bucket when reading hive source code. what is it
>>> means?
>>>
>>> Thanks,
>>> Jeremy
>>> --
>>> My research interests are distributed systems, parallel computing and
>>> bytecode based virtual machine.
>>>
>>> http://coderplay.javaeye.com
>>>
>>
>>
>
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> http://coderplay.javaeye.com
>



-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com

Reply via email to