1) I think that makes sense, though we need to know what the error conditions are, when those errors occur (e.g. at table creation, add column, writing data) and need tests to validate the expected negative cases. I can certainly guess, though I'd like for the affected API calls w/ new expected behavior to be documented somewhere so we can make changes accordingly. 2) Does this mean you'll test up to these limits?
Thanks On Wed, Nov 30, 2016 at 3:33 PM, Todd Lipcon <[email protected]> wrote: > Hey Impala folks, > > Just FYI, I started the below thread on the Kudu lists about adding some > limits/guard rails to various dimensions of Kudu data/metadata. Please take > a look from the Impala perspective and let us know if you foresee any > issues with these limits. > > Just to repeat one thing: I know many SQL workloads require more than 300 > columns in a table, but right now Kudu isn't great in that realm, so we're > setting the limits conservatively. The idea is that over time as we improve > test coverage we'll raise the limits. > > -Todd > > ---------- Forwarded message ---------- > From: Todd Lipcon <[email protected]> > Date: Wed, Nov 30, 2016 at 3:30 PM > Subject: Re: Adding some guard rails to Kudu > To: [email protected], dev <[email protected]> > > > BTW I filed a JIRA here and started linking related issues to it: > https://issues.apache.org/jira/browse/KUDU-1775 > > > On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon <[email protected]> wrote: > >> Hey folks, >> >> I've started working on a few patches to add "guard rails" to various >> user-specified dimensions in Kudu. In particular, I'm planning to add >> limits to the following: >> >> - max number of columns in a table (proposal: 300) >> - max replication factor (proposal: 7) >> - max table name or column name length (proposal: 256) >> - max size of a binary/string column cell value (proposal: 64kb) >> >> The reasoning is that, even though in some cases we don't know a specific >> issue that will happen outside these limits, we've done very little testing >> (and have no automated testing) outside of these ranges. In some cases, we >> do know that there is a certain threshold that will cause a big problem (eg >> large cell sizes can cause tablet servers to crash). In other cases, it's >> just "unknown territory". >> >> In all cases, I'm planning on making the limits overridable via an >> "unsafe" configuration flag. That means that a user can run with >> "--unlock_unsafe_flags --max_identifier_length=1000" if they want to, but >> they're explicitly accepting some risk that they're entering untested >> territory. >> >> Of course, in all cases, if we hear that there are people who are bumping >> the maxes higher than the defaults and having good results, we can consider >> raising the maximum, but I think it's smarter to start conservatively low >> and raise later as we increase test coverage. Also, I'm sure down the road >> we'll add features such as BLOB support or sparse column support, and at >> that time we can remove the corresponding guard rails. >> >> I'm sending this note to both user@ and dev@ to solicit feedback. Are >> there any other dimensions people can think of where we should probably add >> guard-rails? Is anyone out there already outside of the above ranges and >> can make a case that we're being too conservative? >> >> Thanks >> -Todd >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > > > -- > Todd Lipcon > Software Engineer, Cloudera
