ege-st opened a new issue, #10907: URL: https://github.com/apache/pinot/issues/10907
## Issue Enabling NULL support on a table _after_ the table has data will result in incorrect query results for NULL expressions (i.e., `IS NULL` and `IS NOT NULL`). The Null Value Vector bitmap is used by Pinot to identify which values in a column are `null`, when Null Value support is enabled Pinot generates a Null Value Vector bitmap for each segent and this segment is used to evaluate NULL expressions. If a Null Value Vector is present then Pinot will return any value which is `null` for `IS NULL` and it will return any value which is not null for `IS NOT NULL`; however, if there is no Null Value Bitmap in a segment, then Pinot will return no values for `IS NULL` and it will return every value in the segment for `IS NOT NULL`. Critically, as Pinot currently works, a Null Value Vector bitmap is only created at the time a segment is first created and cannot be added afterwards. The issue arises when a table is created without Null support and then, after data is ingested, Null support is enabled. New segments will have a Null Value Vector bitmap, and the older segments will not. If a user then issues a `IS NULL` or `IS NOT NULL` query, the older segments will execute the query as if Null Support is disabled, and the new segments will the query as if Null Support is enabled. The results from each of these segments is then merged together, as if they all evaluated the expression the same way, and that is returned to the user as the result. This will result in the user getting many false positives in their result set. ## Suggestion There are a few issues here, from the user perspective: 1. A user can run queries using `IS NULL` and `IS NOT NULL` even if a table does not support null values. 2. If Null Value support is added to a table after it already has data, then some segments will return incorrect results for a Null expression query. Some things we should consider doing are: 1. Not allow `IS NULL` or `IS NOT NULL` to be used if a table does not have null support enabled. 2. If a user wants to enable null support on a table after it was created, we should not allow it, unless the user uses an override command and the user explicitly states what the null value for the columns should be. 3. If the user explicitly provides the definition for what the Null value of a column is, then Minion can "reindex" a segment and construct the Null Value Vector bitmap. 4. Queries that use Null expressions will return an error until every segment has been reindexed by Minion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
