Re: About bucket feature in carbon

Jacky Li Fri, 09 Feb 2018 04:33:10 -0800

Hi Ravindra,

You mean we can do one round of refactory for bucketed table feature in 
CarbonData 1.4.
I am fine with it.


Regards,
Jacky


> 在 2018年2月9日，下午3:49，Ravindra Pesala <ravi.pes...@gmail.com> 写道：
> 
> Hi Likun,
> 
> I feel it is better to change the implementation to use sparks bucketing
> generation just like how standard hive partitions generates. It will be
> easy to change it after implementing of partition feature. And it is a
> useful feature for joining big tables and hash based buckets and clustered
> by enables the queries faster.  So it is better to change the
> implementation instead of removing it.
> 
> Regards,
> Ravindra.
> 
> On 9 February 2018 at 13:14, Jacky Li <jacky.li...@qq.com> wrote:
> 
>> Hi,
>> 
>> One year ago, CarbonData 1.0.0 has introduced bucket table feature, it was
>> expected to improve join performance by avoiding shuffling if both tables
>> are bucketed on same column with same number of buckets.
>> 
>> However, after this feature was introduced, personally speaking it was not
>> widely used in the community and it creates maintenance overhead for the
>> developers in the community (for very new Pull Request, all bucket related
>> testcase need to be fixed)
>> 
>> And now carbon has integrated with spark standard partition, developer can
>> add bucket support using spark bucketed table feature in future if it
>> requires.
>> 
>> So, I propose to remove bucket feature after CarbonData 1.3.0 version.
>> What do you think?
>> 
>> Regards,
>> Jacky
>> 
>> 
> 
> 
> -- 
> Thanks & Regards,
> Ravi

Re: About bucket feature in carbon

Reply via email to