ege-st opened a new issue, #10907:
URL: https://github.com/apache/pinot/issues/10907

   ## Issue
   Enabling NULL support on a table _after_ the table has data will result in 
incorrect query results
   for NULL expressions (i.e., `IS NULL` and `IS NOT NULL`). The Null Value 
Vector bitmap is used
   by Pinot to identify which values in a column are `null`, when Null Value 
support is enabled
   Pinot generates a Null Value Vector bitmap for each segent and this segment 
is used to evaluate
   NULL expressions.  If a Null Value Vector is present then Pinot will return 
any value which is
   `null` for `IS NULL` and it will return any value which is not null for `IS 
NOT NULL`; however,
   if there is no Null Value Bitmap in a segment, then Pinot will return no 
values for `IS NULL`
   and it will return every value in the segment for `IS NOT NULL`. Critically, 
as Pinot currently
   works, a Null Value Vector bitmap is only created at the time a segment is 
first created and cannot
   be added afterwards.
   
   The issue arises when a table is created without Null support and then, 
after data is ingested,
   Null support is enabled. New segments will have a Null Value Vector bitmap, 
and the older segments
   will not. If a user then issues a `IS NULL` or `IS NOT NULL` query, the 
older segments will execute
   the query as if Null Support is disabled, and the new segments will the 
query as if Null Support is
   enabled. The results from each of these segments is then merged together, as 
if they all evaluated the
   expression the same way, and that is returned to the user as the result. 
This will result in the user getting
   many false positives in their result set.
   
   ## Suggestion
   There are a few issues here, from the user perspective:
   
   1. A user can run queries using `IS NULL` and `IS NOT NULL` even if a table 
does not support null
   values.
   2. If Null Value support is added to a table after it already has data, then 
some segments will return
   incorrect results for a Null expression query.
   
   Some things we should consider doing are:
   
   1. Not allow `IS NULL` or `IS NOT NULL` to be used if a table does not have 
null support enabled.
   2. If a user wants to enable null support on a table after it was created, 
we should not allow it,
   unless the user uses an override command and the user explicitly states what 
the null value for the
   columns should be.
   3. If the user explicitly provides the definition for what the Null value of 
a column is, then Minion
   can "reindex" a segment and construct the Null Value Vector bitmap.
   4. Queries that use Null expressions will return an error until every 
segment has been reindexed by
   Minion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to