ankitsultana commented on issue #14694:
URL: https://github.com/apache/pinot/issues/14694#issuecomment-2558581081

   Yeah I realized this needs more context.
   
   ### Problem
   
   Consider that we have a Realtime table with a single Kafka partition, that 
has `event_timestamp` values in increasing order. But `event_timestamp` can 
also be null occasionally. Visually the segments will look as follows:
   
   **Figure-1:**
   ```
   range of event_timestamp values per segment for only the non-null values
   +------------------+------------------+------------------+------------------+
   | S0: [0, 1000]    | S1: [1000, 2000] | S2: [2000, 3000] | S3: [3000, 4000] 
| 
   +------------------+------------------+------------------+------------------+
   ```
   
   **Figure-2:**
   ```
   range of event_timestamp values per segment when you also consider the 
default null value (0 in this case)
   +------------------+------------------+------------------+------------------+
   | S0: [0, 1000]    | S1: [0, 2000]    | S2: [0, 3000]    | S3: [0, 4000]    
| 
   +------------------+------------------+------------------+------------------+
   ```
   
   From a customer perspective, if they have range filters on the 
`event_timestamp` value (say `event_timestamp BETWEEN 2500 AND 2800`), they 
usually expect that they will only be evaluating the segments whose min/max 
values have overlap with the range predicate in the query. And **technically** 
this is what happens today, but what is easily overlooked is that when you can 
have nulls in the column, then the default null values end up becoming the min 
or max of almost every segment, as can be seen in Figure-2.
   
   This means that segment pruning will no longer work and the range predicate 
will have to be evaluated for every segment. Even with a range index, the 
**cost of evaluating these predicates ends up dominating the CPU of quite a few 
workloads I have seen internally**.
   
   <img width="2467" alt="image" 
src="https://github.com/user-attachments/assets/516dc43e-614a-4554-82f2-3f43fc01cb0e";
 />
   
   ### Factors Affecting Prominence of this Issue
   
   1. When you have a large number of segments
   2. When segment pruning via bloom filters, etc. is not sufficient
   3. etc. etc.
   
   ### Optimizing When Other Selective and/or Efficient Predicates
   
   While segment pruning is not easy to do (see section below), we can still 
try to limit the number of times the range predicate is evaluated per-query 
when the other filters are quite selective. Example: `uuid_col1 = uuid1 AND 
uuid_col2 = uuid2 ... AND event_timestamp BETWEEN x AND y`.
   
   To achieve this, we should short circuit the AndFilter computation when the 
currently evaluated filters have already let to 0 matching rows. This is 
something we should do regardless of this issue since it can also help many 
other cases. (e.g. multiple text_match predicates in a query)
   
   ### Possible Long Term Fix?
   
   If we could set the expectation on the users that they should add 
`event_timestamp IS NOT NULL` to their query alongside their range predicates 
to achieve better segment pruning, then we might have some possible solutions 
to implement. But most of the options I can think of will look awkward.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to