Hey Impala folks,

Just FYI, I started the below thread on the Kudu lists about adding some
limits/guard rails to various dimensions of Kudu data/metadata. Please take
a look from the Impala perspective and let us know if you foresee any
issues with these limits.

Just to repeat one thing: I know many SQL workloads require more than 300
columns in a table, but right now Kudu isn't great in that realm, so we're
setting the limits conservatively. The idea is that over time as we improve
test coverage we'll raise the limits.

-Todd

---------- Forwarded message ----------
From: Todd Lipcon <t...@cloudera.com>
Date: Wed, Nov 30, 2016 at 3:30 PM
Subject: Re: Adding some guard rails to Kudu
To: u...@kudu.apache.org, dev <d...@kudu.apache.org>


BTW I filed a JIRA here and started linking related issues to it:
https://issues.apache.org/jira/browse/KUDU-1775


On Wed, Nov 30, 2016 at 3:25 PM, Todd Lipcon <t...@cloudera.com> wrote:

> Hey folks,
>
> I've started working on a few patches to add "guard rails" to various
> user-specified dimensions in Kudu. In particular, I'm planning to add
> limits to the following:
>
> - max number of columns in a table (proposal: 300)
> - max replication factor (proposal: 7)
> - max table name or column name length (proposal: 256)
> - max size of a binary/string column cell value (proposal: 64kb)
>
> The reasoning is that, even though in some cases we don't know a specific
> issue that will happen outside these limits, we've done very little testing
> (and have no automated testing) outside of these ranges. In some cases, we
> do know that there is a certain threshold that will cause a big problem (eg
> large cell sizes can cause tablet servers to crash). In other cases, it's
> just "unknown territory".
>
> In all cases, I'm planning on making the limits overridable via an
> "unsafe" configuration flag. That means that a user can run with
> "--unlock_unsafe_flags --max_identifier_length=1000" if they want to, but
> they're explicitly accepting some risk that they're entering untested
> territory.
>
> Of course, in all cases, if we hear that there are people who are bumping
> the maxes higher than the defaults and having good results, we can consider
> raising the maximum, but I think it's smarter to start conservatively low
> and raise later as we increase test coverage. Also, I'm sure down the road
> we'll add features such as BLOB support or sparse column support, and at
> that time we can remove the corresponding guard rails.
>
> I'm sending this note to both user@ and dev@ to solicit feedback. Are
> there any other dimensions people can think of where we should probably add
> guard-rails? Is anyone out there already outside of the above ranges and
> can make a case that we're being too conservative?
>
> Thanks
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to