tanishqgandhi1908 opened a new pull request, #5896:
URL: https://github.com/apache/texera/pull/5896

   ### What changes were proposed in this PR?
   
   This PR adds support for `COUNT(*)` in the **Aggregate** operator, so users 
can count all rows without having to pick a column. 
   
   A dedicated **`count(*)`** function is added alongside the existing `count`:
   
   - **`count`** — counts the non-null values of a selected column (unchanged 
behavior).
   - **`count(*)`** — counts every row, including rows with nulls; no column 
needed.
   
   **Backend**
   - New `COUNT_STAR("count(*)")` aggregation function; `countStarAgg` counts 
every row, and `getFinal` rewrites both count variants to `SUM` for the global 
stage.
   - `attribute` is now required for every function **except** `count(*)`, 
enforced via a conditional JSON-schema rule. This gates execution (validated by 
Ajv), so a missing attribute on `count`/`sum`/etc. correctly makes the operator 
invalid.
   
   **Frontend** (Aggregate only)
   - When `count(*)` is selected, the Attribute field is **disabled** (greyed 
out, keeping
     each aggregation row's layout consistent) and any previously-selected 
column is cleared.
   
   **Docs**
   - Updated the Aggregate operator reference page.
   
   #### Screenshots
   
   `count(*)` selected — the Attribute field is disabled, and the result counts 
all rows:
   
   <img width="2872" height="1618" alt="image" 
src="https://github.com/user-attachments/assets/16ef17cd-2872-4d61-829c-c968dc9464f2";
 />
   
   
   ### Any related issues, documentation, discussions?
   
   Closes #3142.
   
   ### How was this PR tested?
   
   **Automated (unit + integration, `AggregateOpSpec` / 
`AggregateOpDescSpec`):**
   - `count(*)` counts every row including nulls, and ignores any attribute 
value that
     leaks through.
   - `count` counts only non-null values of its column.
   - `getAggregationAttribute` / `getFinal` handle `COUNT_STAR`.
   - Schema-propagation guard (`AggregateOpDesc`) and executor guard 
(`AggregateOpExec`) tolerate a blank `count(*)` attribute without dereferencing 
a non-existent column; the result column is typed `INTEGER`.
   
   
   **Manual (UI):** verified the screenshots above — selecting `count(*)` 
disables and clears the Attribute and counts all rows; `count`/`sum` with an 
empty Attribute mark the operator invalid and disable Run; confirmed both with 
and without Group By. The frontend logic lives in `jsonSchemaMapIntercept` 
alongside similar per-operator expression rules (e.g. FileScanOp), which are 
likewise validated manually rather than unit-tested.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.8)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to