This change was intentional, since I wanted to thoroughly remove the defaults in one go. We can always add back implicit range columns when range splits are specified later if we decide it's too much of a burden. I personally like the symmetry with hash partitioning, where setting the columns explicitly is required. It also makes it clear that you _do not_ get range partitioning if you do not set the range partitioning columns. I can see it being somewhat confusing if we implicitly set range partitioning columns when there are range splits, but not when there are hash partitions.
- Dan On Thu, Jun 2, 2016 at 8:40 AM, Todd Lipcon <[email protected]> wrote: > Hey Dan, > > One quick thing I just stumbled upon... it seems like the old behavior was > that you could do the following: > > CreateTableOptions builder = new CreateTableOptions(); > builder.addSplitRow(...); > builder.addSplitRow(...); > ... > client.createTable("foo", schema, builder); > > and it would assume that this was range partitioning based on the whole > primary key. The user in this case _is_ specifying split rows, so I figured > this counted as an explicit partitioning choice and thus wouldn't be > affected by the change mentioned above. Instead, I'm getting an error that > no range partition columns were specified. > > Was this on purpose? Of course I can call setRangePartitionColumns to work > around it, but didn't know if it was intentional. > > -Todd > > On Thu, May 26, 2016 at 11:53 PM, Dan Burkert <[email protected]> wrote: > > > Hi all, > > > > Thanks for the feedback! We've made this change, and it will be part of > > the upcoming 0.9 release. Going forward, all create table calls must > have > > partitioning specified. Existing tables will not be affected. > > > > - Dan > > > > On Fri, May 20, 2016 at 6:41 AM, Jordan Birdsell < > > [email protected]> wrote: > > > > > +1 ...this is a great recommendation > > > > > > -----Original Message----- > > > From: Sand Stone [mailto:[email protected]] > > > Sent: Thursday, May 19, 2016 10:39 PM > > > To: [email protected] > > > Cc: [email protected] > > > Subject: Re: Proposal: remove default partitioning for new tables > > > > > > Agreed that this is a sensible API change. > > > > > > On Thu, May 19, 2016 at 4:07 PM, Abhi Basu <[email protected]> wrote: > > > > > > > I think this a very reasonable feature request. I have recently > started > > > > working with Kudu and the "default" behavior has already tripped me > up > > a > > > > couple times. > > > > > > > > Thanks, > > > > > > > > Abhi > > > > > > > > On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <[email protected]> > > > > wrote: > > > > > > > >> Hi all, > > > >> > > > >> One of the issues that trips up new Kudu users is the uncertainty > > about > > > >> how partitioning works, and how to use partitioning effectively. > Much > > > of > > > >> this can be addressed with better documentation and explanatory > > > materials, > > > >> and that should be an area of focus leading up to our 1.0 release. > > > However, > > > >> the default partitioning behavior is suboptimal, and changing the > > > default > > > >> could lead to significantly less user confusion and frustration. > > > Currently, > > > >> when creating a new table, Kudu defaults to using only a single > > tablet, > > > >> which is a known anti-pattern. This can be painful for users who > > > create a > > > >> table assuming Kudu will have good defaults, and begin loading data > > > only to > > > >> find out later that they will need to recreate the table with > > > partitioning > > > >> to achieve good results. > > > >> > > > >> A better default partitioning strategy might be hash partitioning > over > > > >> the primary key columns, with a number of hash buckets based on the > > > number > > > >> of tablet servers (perhaps something like 3x the number of tablet > > > >> servers). This would alleviate the worst scalability issues with > the > > > >> current default, however it has a few downsides of its own. Hash > > > >> partitioning is not appropriate for every use case, and any > > > rule-of-thumb > > > >> number of tablets we could come up with will not always be optimal. > > > >> > > > >> Given that there is no bullet-proof default, and that changing > > > >> partitioning strategy after table creation is impossible, and > changing > > > the > > > >> default partitioning strategy is a backwards incompatible change, I > > > propose > > > >> we remove the default altogether. Users would be required to > > explicitly > > > >> specify the table partitioning during creation, and failing to do so > > > would > > > >> result in an illegal argument error. Users who really do want only > a > > > >> single tablet will still be able to do so by explicitly configuring > > > range > > > >> partitioning with no split rows. > > > >> > > > >> I'd like to get community feedback on whether this seems like a good > > > >> direction to take. I have put together a patch, you can check out > the > > > >> changes to test files to see what it looks like to add partitioning > > > >> explicitly in cases where the default was being relied on. > > > >> http://gerrit.cloudera.org:8080/#/c/3131/ > > > >> > > > >> - Dan > > > >> > > > > > > > > > > > > > > > > -- > > > > Abhi Basu > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
