Hi all, One of the issues that trips up new Kudu users is the uncertainty about how partitioning works, and how to use partitioning effectively. Much of this can be addressed with better documentation and explanatory materials, and that should be an area of focus leading up to our 1.0 release. However, the default partitioning behavior is suboptimal, and changing the default could lead to significantly less user confusion and frustration. Currently, when creating a new table, Kudu defaults to using only a single tablet, which is a known anti-pattern. This can be painful for users who create a table assuming Kudu will have good defaults, and begin loading data only to find out later that they will need to recreate the table with partitioning to achieve good results.
A better default partitioning strategy might be hash partitioning over the primary key columns, with a number of hash buckets based on the number of tablet servers (perhaps something like 3x the number of tablet servers). This would alleviate the worst scalability issues with the current default, however it has a few downsides of its own. Hash partitioning is not appropriate for every use case, and any rule-of-thumb number of tablets we could come up with will not always be optimal. Given that there is no bullet-proof default, and that changing partitioning strategy after table creation is impossible, and changing the default partitioning strategy is a backwards incompatible change, I propose we remove the default altogether. Users would be required to explicitly specify the table partitioning during creation, and failing to do so would result in an illegal argument error. Users who really do want only a single tablet will still be able to do so by explicitly configuring range partitioning with no split rows. I'd like to get community feedback on whether this seems like a good direction to take. I have put together a patch, you can check out the changes to test files to see what it looks like to add partitioning explicitly in cases where the default was being relied on. http://gerrit.cloudera.org:8080/#/c/3131/ - Dan
