How is this different from partitioning? On Sun, 27 Nov 2016 at 11:21 PM, Ravindra Pesala <[email protected]> wrote:
> Hi All, > > Bucketing concept is based on the hash partition the bucketed column as per > configured bucket numbers. Records with same bucketed column always goes to > the same same bucket. Physically each bucket is a file/files in table > directory. > Advantages > Bucketed table is useful feature to do the map side joins and avoids > shuffling of data. > Carbondata can do driver level pruning on bucketed column to improve query > performance. > > User can add bucketed table as follows > > CREATE TABLE test(user_id BIGINT, firstname STRING, lastname STRING) > CLUSTERED BY(user_id) INTO 32 BUCKETS; > > In the above example column user_id is hash partitioned and creates 32 > buckets/partitions files in carbondata. So while doing the join with other > table on bucketed column it can select same buckets and do the join with > out shuffling. > > Carbon creates following folder structure currently, since carbon is > already supporting partitioning in its file format > > dbName -> tableName - > Fact -> > > Part0 ->Segment_id -> > carbondatafiles > > Part1 ->Segment_id -> > carbondatafiles > > we can also move the partitionid to file metadata.But if we move the > partitionId to metadata then there would be complications in backward > compatibility. > -- > Thanks & Regards, > Ravindra >
