PG1204 opened a new issue, #5773:
URL: https://github.com/apache/texera/issues/5773
### Task Summary
### Task Summary
Type the per-operator statistics fields the backend already sends but the
frontend drops, and expose a derived performance model from
`WorkflowStatusService`, as the data foundation for the workflow heat-map
overlay. No UI change.
### Context
The execution engine emits 11 statistics per operator over the websocket,
but the frontend `OperatorStatistics` type in
`frontend/src/app/workspace/types/execute-workflow.interface.ts` declares only
6:
```typescript
export interface OperatorStatistics
extends Readonly<{
operatorState: OperatorState;
aggregatedInputRowCount: number;
aggregatedInputSize?: number; // bytes
inputPortMetrics: Record<string, number>;
aggregatedOutputRowCount: number;
aggregatedOutputSize?: number; // bytes
outputPortMetrics: Record<string, number>;
numWorkers?: number;
aggregatedDataProcessingTime?: number; // nanoseconds
aggregatedControlProcessingTime?: number; // nanoseconds
aggregatedIdleTime?: number; // nanoseconds
}> {}
```
The five timing/byte-size fields: aggregatedInputSize, aggregatedOutputSize,
aggregatedDataProcessingTime, aggregatedControlProcessingTime,
aggregatedIdleTime, all arrive in the JSON payload but are untyped and unused,
so they are silently discarded.
WorkflowStatusService is currently a thin pass-through over the
OperatorStatisticsUpdateEvent stream and has no derived performance model and
no spec file. This sub-task makes it the single source of truth for
per-operator performance data, ahead of the overlay built in the following
sub-tasks.
Before: websocket -> OperatorStatistics (6 fields; timing & sizes dropped)
After: websocket -> OperatorStatistics (11 fields) -> derived, normalized
per-operator performance metrics
### Proposed Change
In frontend/src/app/workspace/types/execute-workflow.interface.ts:
Add the five missing fields to OperatorStatistics, all optional, so the
partial objects built in resetStatus / clearStatus still type-check.
New
frontend/src/app/workspace/service/workflow-status/performance-metrics.ts
(pure, framework-free):
- A derived OperatorPerformanceMetrics model and a toPerformanceMetrics
mapper that defensively defaults every field to 0.
- A HeatmapView enum (Runtime, Throughput, I/O imbalance) and a
rawMetricForView selector.
- A normalizeScores helper returning [0, 1] scores, with skew handling and
defined empty / single-operator / all-equal behavior.
In
frontend/src/app/workspace/service/workflow-status/workflow-status.service.ts:
- Keep the existing pass-through API unchanged.
- Add a derived performance-metrics stream (BehaviorSubject-backed) plus a
synchronous snapshot getter.
- Add a public setExternalStatus(...) ingestion method so a later sub-task
can feed in restored historical statistics without touching private state.
### Required Test
- New performance-metrics.spec.ts: full mapping, missing-field defaults (no
NaN), all-zero, I/O-imbalance ratio with inputRows === 0, unicode operator id,
and normalizeScores edge cases (empty, single, all-equal, two-value,
heavy-tail, clamping).
- New workflow-status.service.spec.ts: derived stream emits on
OperatorStatisticsUpdateEvent, non-matching events are ignored, the snapshot
getter is synchronous, and resetStatus / clearStatus / setExternalStatus behave
as expected.
### Related
Umbrella issue #5772
RFC discussion #5216
### Task Type
- [x] Refactor / Cleanup
- [ ] DevOps / Deployment / CI
- [ ] Testing / QA
- [ ] Documentation
- [x] Performance
- [ ] Other
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]