BTW I am doing some testing in Impala, and it seems like Impala silently
truncates column names to 128 characters.

Even more fun is:

create table todd_test
(xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
int,
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyy
int)

(two long names which differ after the 128-character truncation). Results
in:

ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: Add request failed : INSERT INTO `COLUMNS_V2`
(`CD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES
(?,?,?,?,?)

So given the 128-character Impala limit, perhaps the Kudu limit should be
more than 256, since you're creating Kudu table names as
'<database>::<table>'.


On Wed, Nov 30, 2016 at 4:03 PM, Todd Lipcon <[email protected]> wrote:

> On Wed, Nov 30, 2016 at 3:52 PM, Matthew Jacobs <[email protected]> wrote:
>
>> 1) I think that makes sense, though we need to know what the error
>> conditions are, when those errors occur (e.g. at table creation, add
>> column, writing data) and need tests to validate the expected negative
>> cases. I can certainly guess, though I'd like for the affected API
>> calls w/ new expected behavior to be documented somewhere so we can
>> make changes accordingly.
>>
>
> For the schema-related ones, they'll behave the same as any other invalid
> schema does today (eg trying to use the same column name twice, or using an
> existing table name, etc).
>
> In terms of API docs, I was hoping that the documentation can be general
> enough about the types of exceptions thrown for general categories rather
> than having to translate every bit of validation logic into English on the
> API docs.
>
> In other words, we should document that createTable() throws an exception
> if the schema was invalid, but I don't think we need to re-document all of
> the ways in which a schema can be invalid in the API docs, do we? I think
> more general user-facing documentation is probably the appropriate place.
>
> Test-wise, I agree that some coverage end-to-end from Impala would be
> nice. We're adding tests that go through our API to validate it, but
> validating that the user-exposed error is reasonable too is of course a
> good idea.
>
> 2) Does this mean you'll test up to these limits?
>>
>
> Yea, that's the long-term intention and would be ideal. However, for now,
> I'm not necessarily guaranteeing that we've got automated testing up to
> these limits today in all combinations. For example, 300 columns works, and
> 64kb cells works, but maybe we'd have an issue with a table where all 300
> columns contain 64kb cells in every row.
>
> So, the hope with this patch series isn't to 100% constrain users such
> that they can never get into a less-tested area. But hopefully we've cut
> out 95% of the space here and prevented some users from shooting themselves
> in the foot. For example, we recently had a case where a bug in the user
> application mis-parsed some input data and tried to insert a 20MB cell,
> which ended up causing an outage, and this relatively simplistic patch
> would have prevented that.
>
> Put another way, we're telling users "if you go above this range, you may
> have problems" but not guaranteeing the logical inverse "if you stay below
> this range, you will never have a problem." That doesn't make it less
> useful, though, IMO :)
>
> -Todd
>
>
>> On Wed, Nov 30, 2016 at 3:33 PM, Todd Lipcon <[email protected]> wrote:
>> > Hey Impala folks,
>> >
>> > Just FYI, I started the below thread on the Kudu lists about adding some
>> > limits/guard rails to various dimensions of Kudu data/metadata. Please
>> take
>> > a look from the Impala perspective and let us know if you foresee any
>> > issues with these limits.
>> >
>> > Just to repeat one thing: I know many SQL workloads require more than
>> 300
>> > columns in a table, but right now Kudu isn't great in that realm, so
>> we're
>> > setting the limits conservatively. The idea is that over time as we
>> improve
>> > test coverage we'll raise the limits.
>> >
>> > -Todd
>> >
>> > ---------- Forwarded message ----------
>> > From: Todd Lipcon <[email protected]>
>> > Date: Wed, Nov 30, 2016 at 3:30 PM
>> > Subject: Re: Adding some guard rails to Kudu
>> > To: [email protected], dev <[email protected]>
>> >
>> >
>> > BTW I filed a JIRA here and started linking related issues to it:
>> > https://issues.apache.org/jira/browse/KUDU-1775
>> >
>> >
>> > On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon <[email protected]> wrote:
>> >
>> >> Hey folks,
>> >>
>> >> I've started working on a few patches to add "guard rails" to various
>> >> user-specified dimensions in Kudu. In particular, I'm planning to add
>> >> limits to the following:
>> >>
>> >> - max number of columns in a table (proposal: 300)
>> >> - max replication factor (proposal: 7)
>> >> - max table name or column name length (proposal: 256)
>> >> - max size of a binary/string column cell value (proposal: 64kb)
>> >>
>> >> The reasoning is that, even though in some cases we don't know a
>> specific
>> >> issue that will happen outside these limits, we've done very little
>> testing
>> >> (and have no automated testing) outside of these ranges. In some
>> cases, we
>> >> do know that there is a certain threshold that will cause a big
>> problem (eg
>> >> large cell sizes can cause tablet servers to crash). In other cases,
>> it's
>> >> just "unknown territory".
>> >>
>> >> In all cases, I'm planning on making the limits overridable via an
>> >> "unsafe" configuration flag. That means that a user can run with
>> >> "--unlock_unsafe_flags --max_identifier_length=1000" if they want to,
>> but
>> >> they're explicitly accepting some risk that they're entering untested
>> >> territory.
>> >>
>> >> Of course, in all cases, if we hear that there are people who are
>> bumping
>> >> the maxes higher than the defaults and having good results, we can
>> consider
>> >> raising the maximum, but I think it's smarter to start conservatively
>> low
>> >> and raise later as we increase test coverage. Also, I'm sure down the
>> road
>> >> we'll add features such as BLOB support or sparse column support, and
>> at
>> >> that time we can remove the corresponding guard rails.
>> >>
>> >> I'm sending this note to both user@ and dev@ to solicit feedback. Are
>> >> there any other dimensions people can think of where we should
>> probably add
>> >> guard-rails? Is anyone out there already outside of the above ranges
>> and
>> >> can make a case that we're being too conservative?
>> >>
>> >> Thanks
>> >> -Todd
>> >> --
>> >> Todd Lipcon
>> >> Software Engineer, Cloudera
>> >>
>> >
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to