ankitsultana commented on issue #14694: URL: https://github.com/apache/pinot/issues/14694#issuecomment-2558581081
Yeah I realized this needs more context. ### Problem Consider that we have a Realtime table with a single Kafka partition, that has `event_timestamp` values in increasing order. But `event_timestamp` can also be null occasionally. Visually the segments will look as follows: **Figure-1:** ``` range of event_timestamp values per segment for only the non-null values +------------------+------------------+------------------+------------------+ | S0: [0, 1000] | S1: [1000, 2000] | S2: [2000, 3000] | S3: [3000, 4000] | +------------------+------------------+------------------+------------------+ ``` **Figure-2:** ``` range of event_timestamp values per segment when you also consider the default null value (0 in this case) +------------------+------------------+------------------+------------------+ | S0: [0, 1000] | S1: [0, 2000] | S2: [0, 3000] | S3: [0, 4000] | +------------------+------------------+------------------+------------------+ ``` From a customer perspective, if they have range filters on the `event_timestamp` value (say `event_timestamp BETWEEN 2500 AND 2800`), they usually expect that they will only be evaluating the segments whose min/max values have overlap with the range predicate in the query. And **technically** this is what happens today, but what is easily overlooked is that when you can have nulls in the column, then the default null values end up becoming the min or max of almost every segment, as can be seen in Figure-2. This means that segment pruning will no longer work and the range predicate will have to be evaluated for every segment. Even with a range index, the **cost of evaluating these predicates ends up dominating the CPU of quite a few workloads I have seen internally**. <img width="2467" alt="image" src="https://github.com/user-attachments/assets/516dc43e-614a-4554-82f2-3f43fc01cb0e" /> ### Factors Affecting Prominence of this Issue 1. When you have a large number of segments 2. When segment pruning via bloom filters, etc. is not sufficient 3. etc. etc. ### Optimizing When Other Selective and/or Efficient Predicates While segment pruning is not easy to do (see section below), we can still try to limit the number of times the range predicate is evaluated per-query when the other filters are quite selective. Example: `uuid_col1 = uuid1 AND uuid_col2 = uuid2 ... AND event_timestamp BETWEEN x AND y`. To achieve this, we should short circuit the AndFilter computation when the currently evaluated filters have already let to 0 matching rows. This is something we should do regardless of this issue since it can also help many other cases. (e.g. multiple text_match predicates in a query) ### Possible Long Term Fix? If we could set the expectation on the users that they should add `event_timestamp IS NOT NULL` to their query alongside their range predicates to achieve better segment pruning, then we might have some possible solutions to implement. But most of the options I can think of will look awkward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
