Is that the same meaning of hash partition?

On Thu, Jan 8, 2009 at 4:52 PM, Jeff Hammerbacher <[email protected]>wrote:

> Hey Jeremy,
>
> Hive stores each "table" inside of HDFS in a folder. For example, all of
> your weblogs could be stored in a folder called "/hive/weblogs". If you want
> to partition those weblogs by day, you can use the PARTITIONED BY clause on
> the CREATE TABLE statement to create a subfolder for each new day, e.g.
> "/hive/weblogs/ds=2009-01-08". If you wanted to further partition a day's
> logfiles by userid, for example, Hive can hash partition your logfiles into
> "buckets" (subfolders) inside that day's folder, e.g.
> "/hive/weblogs/ds=2009-01-08/0001", where 0001 is the name of the bucket. To
> indicate your desire to have buckets, use the CLUSTERED BY clause on the
> CREATE TABLE statement (see
> http://wiki.apache.org/hadoop/Hive/HiveQL#head-6fb42f2747383d4375e56cc31bbae68860c88a3d
> ).
>
> You can also use buckets with the TABLESAMPLE operator to run Hive queries
> over subsets of your data; this is useful for rapidly prototyping new
> analyses. See
> http://wiki.apache.org/hadoop/Hive/HiveQL#head-c7c5e4391816048d290eb70091487b4f91beebc9for
>  the TABLESAMPLE syntax.
>
> Hive folks: in case I butchered that, feel free to jump in with a more
> correct explanation. If it's correct, I'll toss it on the wiki. It would be
> good to have actual HiveQL statements using buckets on the getting started
> guide too, I'd imagine.
>
> Later,
> Jeff
>
>
> On Thu, Jan 8, 2009 at 12:21 AM, Jeremy Chow <[email protected]> wrote:
>
>> Hi list,
>>
>> I get a term named bucket when reading hive source code. what is it means?
>>
>> Thanks,
>> Jeremy
>> --
>> My research interests are distributed systems, parallel computing and
>> bytecode based virtual machine.
>>
>> http://coderplay.javaeye.com
>>
>
>


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com

Reply via email to