>Are there any performance implications in making those the defaults?
For numeric columns without null values there should be no impact. For
numeric columns with null values, I believe there is currently a very
minor performance impact for scans/aggregations since the null value
bitmap is iterated to check which rows are null. String performance
should not be changed afaik. I will try to find some time to do some
benchmarking to ensure this is true and measure the stuff to confirm
this, and look for opportunities to improve things where possible in
order to make this happen, since performance is important. Longer
term, I have a hunch that if we actually delete all of the default
value mode code there might also be some performance improvements due
to smaller function bytecode sizes and the lack of 'if sql compatible
mode do this, else do this' being littered everywhere across the
codebase.

>Would this break compatibility for any existing native queries?
There are a handful of query shapes which might have some differences,
the most common are probably with strings since in default value mode
the empty string and null are treated interchangeably, so these kinds
of queries will need to be modified.

There are also the possibility for some subtle differences in some
numeric expressions due to the 'pretend nulls do not exist' behavior
of the default value mode config, in default value mode you can write
arguably nonsense expressions that end up evaluating stuff like "3 +
'hello'" and it will evaluate to 3, while in SQL compatible mode the
'hello' will be unable to be implicitly cast to a number and become
zero, resulting in null.

Strict boolean mode also has some subtle differences if selecting the
outputs, since it homogenizes all true/false values to 1/0 instead of
the javascript style behavior when it is off, where the truthy values
are just passed through.
https://druid.apache.org/docs/latest/querying/math-expr#logical-operator-modes

The other main change in behavior is due to the SQL three-value logic,
where a filter like "x != 'hello'" today will in default value mode
match rows with null values, but in SQL compatible mode will not match
nulls. These queries will need to be rewritten into the form "x !=
'hello' OR x is null", or there is also some native filters added
recently which could be used to wrap to make the SQL equivalent of "(x
= 'hello') is not true", https://github.com/apache/druid/pull/15182,
or "not(istrue(eq(x, 'hello'))". I'm actually open to the idea of
keeping the toggle config for 2vl/3vl since it would need checked in
very few places to control the behavior (basically in not filter), my
main goal here is to eventually get rid of
druid.generic.useDefaultValueForNull.

I plan to dig into all of this in greater depth in the migration guide
mentioned in the first email to make the transition as painless as
possible before we remove it.

On Mon, Jan 22, 2024 at 9:54 AM Xavier Léauté
<xav...@confluent.io.invalid> wrote:
>
> Two questions:
> - Are there any performance implications in making those the defaults?
> - Would this break compatibility for any existing native queries?
>
> On Wed, Jan 17, 2024 at 5:53 PM Clint Wylie <cwy...@apache.org> wrote:
>
> > Hi all,
> >
> > I wanted to discuss the deprecation and removal of the Druid configs
> > related to SQL compatibility, with the eventual goal of always running
> > in SQL compatible mode. As of Druid 28, these are the default, but in
> > the interest of dramatically reducing complexity and removing a ton of
> > code, and also cutting a lot of our CI time in half, I would
> > eventually like to remove the related configs and the code that
> > handles the now non-default behaviors completely.
> >
> > The related configs are:
> >
> > runtime.properties:
> > druid.generic.useDefaultValueForNull  - (will become always false)
> > druid.expressions.useStrictBooleans - (will become always true)
> > druid.generic.useThreeValueLogicForNativeFilters - (will become always
> > true)
> > druid.generic.ignoreNullsForStringCardinality - (irrelevant if
> > druid.generic.useDefaultValueForNull=false)
> >
> > query context:
> > sqlUseBoundAndSelectors - (this is moderately related, and defaults to
> > value of druid.generic.useDefaultValueForNull, but enhancements to
> > expressions and sql planning for lookups make this totally unnecessary
> > to keep around)
> >
> > other things to dump while we are at it:
> > druid.expressions.allowNestedArrays - (will become always true)
> >
> > There might be additional configs which can also be removed, we can
> > add to this thread if we can think of them.
> >
> > I would like to get the official deprecation process started now with
> > Druid 29, and imagine actually removing them sometime towards the end
> > of the year, so maybe Druid 32 or so?
> >
> > Before completely removing them I think I would like to get a more in
> > depth migration guide in place, to help any hold-outs that are
> > overriding the now default SQL compatible configs so that things still
> > run in the legacy mode.
> >
> > Thoughts? Concerns?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
> > For additional commands, e-mail: dev-h...@druid.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org
For additional commands, e-mail: dev-h...@druid.apache.org

Reply via email to