Hey Impala folks, Just FYI, I started the below thread on the Kudu lists about adding some limits/guard rails to various dimensions of Kudu data/metadata. Please take a look from the Impala perspective and let us know if you foresee any issues with these limits.
Just to repeat one thing: I know many SQL workloads require more than 300 columns in a table, but right now Kudu isn't great in that realm, so we're setting the limits conservatively. The idea is that over time as we improve test coverage we'll raise the limits. -Todd ---------- Forwarded message ---------- From: Todd Lipcon <t...@cloudera.com> Date: Wed, Nov 30, 2016 at 3:30 PM Subject: Re: Adding some guard rails to Kudu To: u...@kudu.apache.org, dev <d...@kudu.apache.org> BTW I filed a JIRA here and started linking related issues to it: https://issues.apache.org/jira/browse/KUDU-1775 On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon <t...@cloudera.com> wrote: > Hey folks, > > I've started working on a few patches to add "guard rails" to various > user-specified dimensions in Kudu. In particular, I'm planning to add > limits to the following: > > - max number of columns in a table (proposal: 300) > - max replication factor (proposal: 7) > - max table name or column name length (proposal: 256) > - max size of a binary/string column cell value (proposal: 64kb) > > The reasoning is that, even though in some cases we don't know a specific > issue that will happen outside these limits, we've done very little testing > (and have no automated testing) outside of these ranges. In some cases, we > do know that there is a certain threshold that will cause a big problem (eg > large cell sizes can cause tablet servers to crash). In other cases, it's > just "unknown territory". > > In all cases, I'm planning on making the limits overridable via an > "unsafe" configuration flag. That means that a user can run with > "--unlock_unsafe_flags --max_identifier_length=1000" if they want to, but > they're explicitly accepting some risk that they're entering untested > territory. > > Of course, in all cases, if we hear that there are people who are bumping > the maxes higher than the defaults and having good results, we can consider > raising the maximum, but I think it's smarter to start conservatively low > and raise later as we increase test coverage. Also, I'm sure down the road > we'll add features such as BLOB support or sparse column support, and at > that time we can remove the corresponding guard rails. > > I'm sending this note to both user@ and dev@ to solicit feedback. Are > there any other dimensions people can think of where we should probably add > guard-rails? Is anyone out there already outside of the above ranges and > can make a case that we're being too conservative? > > Thanks > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera -- Todd Lipcon Software Engineer, Cloudera