>Are there any performance implications in making those the defaults? For numeric columns without null values there should be no impact. For numeric columns with null values, I believe there is currently a very minor performance impact for scans/aggregations since the null value bitmap is iterated to check which rows are null. String performance should not be changed afaik. I will try to find some time to do some benchmarking to ensure this is true and measure the stuff to confirm this, and look for opportunities to improve things where possible in order to make this happen, since performance is important. Longer term, I have a hunch that if we actually delete all of the default value mode code there might also be some performance improvements due to smaller function bytecode sizes and the lack of 'if sql compatible mode do this, else do this' being littered everywhere across the codebase.
>Would this break compatibility for any existing native queries? There are a handful of query shapes which might have some differences, the most common are probably with strings since in default value mode the empty string and null are treated interchangeably, so these kinds of queries will need to be modified. There are also the possibility for some subtle differences in some numeric expressions due to the 'pretend nulls do not exist' behavior of the default value mode config, in default value mode you can write arguably nonsense expressions that end up evaluating stuff like "3 + 'hello'" and it will evaluate to 3, while in SQL compatible mode the 'hello' will be unable to be implicitly cast to a number and become zero, resulting in null. Strict boolean mode also has some subtle differences if selecting the outputs, since it homogenizes all true/false values to 1/0 instead of the javascript style behavior when it is off, where the truthy values are just passed through. https://druid.apache.org/docs/latest/querying/math-expr#logical-operator-modes The other main change in behavior is due to the SQL three-value logic, where a filter like "x != 'hello'" today will in default value mode match rows with null values, but in SQL compatible mode will not match nulls. These queries will need to be rewritten into the form "x != 'hello' OR x is null", or there is also some native filters added recently which could be used to wrap to make the SQL equivalent of "(x = 'hello') is not true", https://github.com/apache/druid/pull/15182, or "not(istrue(eq(x, 'hello'))". I'm actually open to the idea of keeping the toggle config for 2vl/3vl since it would need checked in very few places to control the behavior (basically in not filter), my main goal here is to eventually get rid of druid.generic.useDefaultValueForNull. I plan to dig into all of this in greater depth in the migration guide mentioned in the first email to make the transition as painless as possible before we remove it. On Mon, Jan 22, 2024 at 9:54 AM Xavier Léauté <xav...@confluent.io.invalid> wrote: > > Two questions: > - Are there any performance implications in making those the defaults? > - Would this break compatibility for any existing native queries? > > On Wed, Jan 17, 2024 at 5:53 PM Clint Wylie <cwy...@apache.org> wrote: > > > Hi all, > > > > I wanted to discuss the deprecation and removal of the Druid configs > > related to SQL compatibility, with the eventual goal of always running > > in SQL compatible mode. As of Druid 28, these are the default, but in > > the interest of dramatically reducing complexity and removing a ton of > > code, and also cutting a lot of our CI time in half, I would > > eventually like to remove the related configs and the code that > > handles the now non-default behaviors completely. > > > > The related configs are: > > > > runtime.properties: > > druid.generic.useDefaultValueForNull - (will become always false) > > druid.expressions.useStrictBooleans - (will become always true) > > druid.generic.useThreeValueLogicForNativeFilters - (will become always > > true) > > druid.generic.ignoreNullsForStringCardinality - (irrelevant if > > druid.generic.useDefaultValueForNull=false) > > > > query context: > > sqlUseBoundAndSelectors - (this is moderately related, and defaults to > > value of druid.generic.useDefaultValueForNull, but enhancements to > > expressions and sql planning for lookups make this totally unnecessary > > to keep around) > > > > other things to dump while we are at it: > > druid.expressions.allowNestedArrays - (will become always true) > > > > There might be additional configs which can also be removed, we can > > add to this thread if we can think of them. > > > > I would like to get the official deprecation process started now with > > Druid 29, and imagine actually removing them sometime towards the end > > of the year, so maybe Druid 32 or so? > > > > Before completely removing them I think I would like to get a more in > > depth migration guide in place, to help any hold-outs that are > > overriding the now default SQL compatible configs so that things still > > run in the legacy mode. > > > > Thoughts? Concerns? > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org > > For additional commands, e-mail: dev-h...@druid.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@druid.apache.org For additional commands, e-mail: dev-h...@druid.apache.org