Re: [New Feature] Adding bucketed table feature to Carbondata

Raghunandan S Sun, 27 Nov 2016 09:54:58 -0800

How is this different from partitioning?
On Sun, 27 Nov 2016 at 11:21 PM, Ravindra Pesala <[email protected]>
wrote:


> Hi All,
>
> Bucketing concept is based on the hash partition the bucketed column as per
> configured bucket numbers. Records with same bucketed column always goes to
> the same same bucket. Physically each bucket is a file/files in table
> directory.
> Advantages
> Bucketed table is useful feature to do the map side joins and avoids
> shuffling of data.
> Carbondata can do driver level pruning on bucketed column to improve query
> performance.
>
> User can add bucketed table as follows
>
> CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING)
> CLUSTERED BY(user_id) INTO 32 BUCKETS;
>
> In the above example column user_id is hash partitioned and creates 32
> buckets/partitions files in carbondata. So while doing the join with other
> table on bucketed column it can select same buckets and do the join with
> out shuffling.
>
> Carbon creates following folder structure currently, since carbon is
> already supporting partitioning in its file format
>
> dbName -> tableName - > Fact ->
>
>                                                Part0 ->Segment_id ->
> carbondatafiles
>
>                                                Part1 ->Segment_id ->
> carbondatafiles
>
> we can also move the partitionid to file metadata.But if we move the
> partitionId to metadata then there would be complications in backward
> compatibility.
> --
> Thanks & Regards,
> Ravindra
>

Re: [New Feature] Adding bucketed table feature to Carbondata

Reply via email to