clintropolis commented on a change in pull request #8894: document SQL 
compatible null handling mode
URL: https://github.com/apache/incubator-druid/pull/8894#discussion_r348299748
 
 

 ##########
 File path: docs/design/segments.md
 ##########
 @@ -143,6 +143,11 @@ the 'column data' is an array of values. Additionally, a 
row with *n*
 values in 'column data' will have *n* non-zero valued entries in
 bitmaps.
 
+## SQL Compatible Null Handling
+By default, Druid string dimension columns use the values `''` and `null` 
interchangeably and numeric and metric columns can not represent `null` at all, 
instead coercing nulls to `0`. However, Druid also provides an SQL compatible 
null handling mode, which must be enabled at the system level, through 
`druid.generic.useDefaultValueForNull`. This setting, when set to `false`, will 
allow Druid to _at ingestion time_ create segments whose string columns can 
distinguish `''` from `null`, and numeric columns which can represent `null` 
valued rows instead of `0`.
+
+String dimension columns contain no additional column structures in this mode, 
instead just reserving an additional dictionary entry for the `null` value. 
Numeric columns however will be stored in the segment with an additional 
`bitmap` whose set bits indicate `null` valued rows. In addition to slightly 
increased segment sizes, this also means that SQL compatible null handling 
comes at a query time cost for numeric columns too, which must now check 
whether or not the row is null valued during selection and aggregation. This 
overhead has been calculated to be approximately 10-20 nanoseconds _per row_ 
scanned in each query, so it is worth considering if the expressivity is worth 
the performance hit for your individual use case.
 
 Review comment:
   Changed to suggestion

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to