+1 ...this is a great recommendation -----Original Message----- From: Sand Stone [mailto:[email protected]] Sent: Thursday, May 19, 2016 10:39 PM To: [email protected] Cc: [email protected] Subject: Re: Proposal: remove default partitioning for new tables
Agreed that this is a sensible API change. On Thu, May 19, 2016 at 4:07 PM, Abhi Basu <[email protected]> wrote: > I think this a very reasonable feature request. I have recently started > working with Kudu and the "default" behavior has already tripped me up a > couple times. > > Thanks, > > Abhi > > On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <[email protected]> > wrote: > >> Hi all, >> >> One of the issues that trips up new Kudu users is the uncertainty about >> how partitioning works, and how to use partitioning effectively. Much of >> this can be addressed with better documentation and explanatory materials, >> and that should be an area of focus leading up to our 1.0 release. However, >> the default partitioning behavior is suboptimal, and changing the default >> could lead to significantly less user confusion and frustration. Currently, >> when creating a new table, Kudu defaults to using only a single tablet, >> which is a known anti-pattern. This can be painful for users who create a >> table assuming Kudu will have good defaults, and begin loading data only to >> find out later that they will need to recreate the table with partitioning >> to achieve good results. >> >> A better default partitioning strategy might be hash partitioning over >> the primary key columns, with a number of hash buckets based on the number >> of tablet servers (perhaps something like 3x the number of tablet >> servers). This would alleviate the worst scalability issues with the >> current default, however it has a few downsides of its own. Hash >> partitioning is not appropriate for every use case, and any rule-of-thumb >> number of tablets we could come up with will not always be optimal. >> >> Given that there is no bullet-proof default, and that changing >> partitioning strategy after table creation is impossible, and changing the >> default partitioning strategy is a backwards incompatible change, I propose >> we remove the default altogether. Users would be required to explicitly >> specify the table partitioning during creation, and failing to do so would >> result in an illegal argument error. Users who really do want only a >> single tablet will still be able to do so by explicitly configuring range >> partitioning with no split rows. >> >> I'd like to get community feedback on whether this seems like a good >> direction to take. I have put together a patch, you can check out the >> changes to test files to see what it looks like to add partitioning >> explicitly in cases where the default was being relied on. >> http://gerrit.cloudera.org:8080/#/c/3131/ >> >> - Dan >> > > > > -- > Abhi Basu >
