BTW I am doing some testing in Impala, and it seems like Impala silently truncates column names to 128 characters.
Even more fun is: create table todd_test (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx int, xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyy int) (two long names which differ after the 128-character truncation). Results in: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore: CAUSED BY: MetaException: Add request failed : INSERT INTO `COLUMNS_V2` (`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (?,?,?,?,?) So given the 128-character Impala limit, perhaps the Kudu limit should be more than 256, since you're creating Kudu table names as '<database>::<table>'. On Wed, Nov 30, 2016 at 4:03 PM, Todd Lipcon <[email protected]> wrote: > On Wed, Nov 30, 2016 at 3:52 PM, Matthew Jacobs <[email protected]> wrote: > >> 1) I think that makes sense, though we need to know what the error >> conditions are, when those errors occur (e.g. at table creation, add >> column, writing data) and need tests to validate the expected negative >> cases. I can certainly guess, though I'd like for the affected API >> calls w/ new expected behavior to be documented somewhere so we can >> make changes accordingly. >> > > For the schema-related ones, they'll behave the same as any other invalid > schema does today (eg trying to use the same column name twice, or using an > existing table name, etc). > > In terms of API docs, I was hoping that the documentation can be general > enough about the types of exceptions thrown for general categories rather > than having to translate every bit of validation logic into English on the > API docs. > > In other words, we should document that createTable() throws an exception > if the schema was invalid, but I don't think we need to re-document all of > the ways in which a schema can be invalid in the API docs, do we? I think > more general user-facing documentation is probably the appropriate place. > > Test-wise, I agree that some coverage end-to-end from Impala would be > nice. We're adding tests that go through our API to validate it, but > validating that the user-exposed error is reasonable too is of course a > good idea. > > 2) Does this mean you'll test up to these limits? >> > > Yea, that's the long-term intention and would be ideal. However, for now, > I'm not necessarily guaranteeing that we've got automated testing up to > these limits today in all combinations. For example, 300 columns works, and > 64kb cells works, but maybe we'd have an issue with a table where all 300 > columns contain 64kb cells in every row. > > So, the hope with this patch series isn't to 100% constrain users such > that they can never get into a less-tested area. But hopefully we've cut > out 95% of the space here and prevented some users from shooting themselves > in the foot. For example, we recently had a case where a bug in the user > application mis-parsed some input data and tried to insert a 20MB cell, > which ended up causing an outage, and this relatively simplistic patch > would have prevented that. > > Put another way, we're telling users "if you go above this range, you may > have problems" but not guaranteeing the logical inverse "if you stay below > this range, you will never have a problem." That doesn't make it less > useful, though, IMO :) > > -Todd > > >> On Wed, Nov 30, 2016 at 3:33 PM, Todd Lipcon <[email protected]> wrote: >> > Hey Impala folks, >> > >> > Just FYI, I started the below thread on the Kudu lists about adding some >> > limits/guard rails to various dimensions of Kudu data/metadata. Please >> take >> > a look from the Impala perspective and let us know if you foresee any >> > issues with these limits. >> > >> > Just to repeat one thing: I know many SQL workloads require more than >> 300 >> > columns in a table, but right now Kudu isn't great in that realm, so >> we're >> > setting the limits conservatively. The idea is that over time as we >> improve >> > test coverage we'll raise the limits. >> > >> > -Todd >> > >> > ---------- Forwarded message ---------- >> > From: Todd Lipcon <[email protected]> >> > Date: Wed, Nov 30, 2016 at 3:30 PM >> > Subject: Re: Adding some guard rails to Kudu >> > To: [email protected], dev <[email protected]> >> > >> > >> > BTW I filed a JIRA here and started linking related issues to it: >> > https://issues.apache.org/jira/browse/KUDU-1775 >> > >> > >> > On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon <[email protected]> wrote: >> > >> >> Hey folks, >> >> >> >> I've started working on a few patches to add "guard rails" to various >> >> user-specified dimensions in Kudu. In particular, I'm planning to add >> >> limits to the following: >> >> >> >> - max number of columns in a table (proposal: 300) >> >> - max replication factor (proposal: 7) >> >> - max table name or column name length (proposal: 256) >> >> - max size of a binary/string column cell value (proposal: 64kb) >> >> >> >> The reasoning is that, even though in some cases we don't know a >> specific >> >> issue that will happen outside these limits, we've done very little >> testing >> >> (and have no automated testing) outside of these ranges. In some >> cases, we >> >> do know that there is a certain threshold that will cause a big >> problem (eg >> >> large cell sizes can cause tablet servers to crash). In other cases, >> it's >> >> just "unknown territory". >> >> >> >> In all cases, I'm planning on making the limits overridable via an >> >> "unsafe" configuration flag. That means that a user can run with >> >> "--unlock_unsafe_flags --max_identifier_length=1000" if they want to, >> but >> >> they're explicitly accepting some risk that they're entering untested >> >> territory. >> >> >> >> Of course, in all cases, if we hear that there are people who are >> bumping >> >> the maxes higher than the defaults and having good results, we can >> consider >> >> raising the maximum, but I think it's smarter to start conservatively >> low >> >> and raise later as we increase test coverage. Also, I'm sure down the >> road >> >> we'll add features such as BLOB support or sparse column support, and >> at >> >> that time we can remove the corresponding guard rails. >> >> >> >> I'm sending this note to both user@ and dev@ to solicit feedback. Are >> >> there any other dimensions people can think of where we should >> probably add >> >> guard-rails? Is anyone out there already outside of the above ranges >> and >> >> can make a case that we're being too conservative? >> >> >> >> Thanks >> >> -Todd >> >> -- >> >> Todd Lipcon >> >> Software Engineer, Cloudera >> >> >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > >> > >> > >> > -- >> > Todd Lipcon >> > Software Engineer, Cloudera >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera
