[PR] [Query Engine] ANY VALUE Support [pinot]

via GitHub Sat, 23 Aug 2025 17:33:37 -0700


praveenc7 opened a new pull request, #16678:
URL: https://github.com/apache/pinot/pull/16678


   # ANY_VALUE Aggregation Function
   
   ## Overview
   
   The `ANY_VALUE` aggregation function returns any arbitrary value from the 
specified column for each group in a `GROUP BY` query. This function is 
particularly useful for optimizing queries where you need to include a column 
in the `SELECT` clause that has a 1:1 mapping with the `GROUP BY` columns, 
without forcing that column into the `GROUP BY` clause.
   
   ## Syntax
   
   ```sql
   ANY_VALUE(column_name)
   ```
   
   ## Use Cases
   
   ### Primary Use Case: Avoiding Unnecessary GROUP BY Columns
   
   Standard SQL requires that all non-aggregation expressions in the `SELECT` 
clause must also be part of the `GROUP BY` clause. However, there are scenarios 
where:
   
   1. **Functional Dependency**: A column has a 1:1 mapping with the grouped 
columns
   2. **Performance Optimization**: Adding the column to `GROUP BY` would 
increase computation without changing the logical result
   3. **Query Simplification**: You want to display related data without 
affecting grouping behavior
   
   ### Example Scenario
   
   Consider an `ORDERS` table with `CustomerID`, `CustomerName`, and 
`OrderValue`:
   
   **Without ANY_VALUE (Standard SQL):**
   ```sql
   -- Option 1: Add CustomerName to GROUP BY (more computation)
   SELECT CustomerID, CustomerName, SUM(OrderValue)
   FROM Orders 
   GROUP BY CustomerID, CustomerName;
   
   -- Option 2: Remove CustomerName from SELECT (lose information)
   SELECT CustomerID, SUM(OrderValue)
   FROM Orders 
   GROUP BY CustomerID;
   ```
   
   **With ANY_VALUE (Optimized):**
   ```sql
   -- Get CustomerName without adding to GROUP BY
   SELECT CustomerID, ANY_VALUE(CustomerName), SUM(OrderValue)
   FROM Orders 
   GROUP BY CustomerID;
   ```
   
   ## Behavior
   
   - **Returns**: Any arbitrary value from the column for each group
   - **Null Handling**: Supports both null-handling enabled and disabled modes
   - **Data Types**: Works with all Pinot data types (INT, LONG, FLOAT, DOUBLE, 
STRING, BYTES, BIG_DECIMAL)
   - **Performance**: More efficient than adding the column to `GROUP BY` when 
there's a 1:1 mapping
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [Query Engine] ANY VALUE Support [pinot]

Reply via email to