tanishqgandhi1908 opened a new pull request, #5896:
URL: https://github.com/apache/texera/pull/5896
### What changes were proposed in this PR?
This PR adds support for `COUNT(*)` in the **Aggregate** operator, so users
can count all rows without having to pick a column.
A dedicated **`count(*)`** function is added alongside the existing `count`:
- **`count`** — counts the non-null values of a selected column (unchanged
behavior).
- **`count(*)`** — counts every row, including rows with nulls; no column
needed.
**Backend**
- New `COUNT_STAR("count(*)")` aggregation function; `countStarAgg` counts
every row, and `getFinal` rewrites both count variants to `SUM` for the global
stage.
- `attribute` is now required for every function **except** `count(*)`,
enforced via a conditional JSON-schema rule. This gates execution (validated by
Ajv), so a missing attribute on `count`/`sum`/etc. correctly makes the operator
invalid.
**Frontend** (Aggregate only)
- When `count(*)` is selected, the Attribute field is **disabled** (greyed
out, keeping
each aggregation row's layout consistent) and any previously-selected
column is cleared.
**Docs**
- Updated the Aggregate operator reference page.
#### Screenshots
`count(*)` selected — the Attribute field is disabled, and the result counts
all rows:
<img width="2872" height="1618" alt="image"
src="https://github.com/user-attachments/assets/16ef17cd-2872-4d61-829c-c968dc9464f2"
/>
### Any related issues, documentation, discussions?
Closes #3142.
### How was this PR tested?
**Automated (unit + integration, `AggregateOpSpec` /
`AggregateOpDescSpec`):**
- `count(*)` counts every row including nulls, and ignores any attribute
value that
leaks through.
- `count` counts only non-null values of its column.
- `getAggregationAttribute` / `getFinal` handle `COUNT_STAR`.
- Schema-propagation guard (`AggregateOpDesc`) and executor guard
(`AggregateOpExec`) tolerate a blank `count(*)` attribute without dereferencing
a non-existent column; the result column is typed `INTEGER`.
**Manual (UI):** verified the screenshots above — selecting `count(*)`
disables and clears the Attribute and counts all rows; `count`/`sum` with an
empty Attribute mark the operator invalid and disable Run; confirmed both with
and without Group By. The frontend logic lives in `jsonSchemaMapIntercept`
alongside similar per-operator expression rules (e.g. FileScanOp), which are
likewise validated manually rather than unit-tested.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]