mingmwang commented on code in PR #124:
URL: https://github.com/apache/arrow-ballista/pull/124#discussion_r945671491
##########
ballista/rust/scheduler/src/state/execution_graph.rs:
##########
@@ -153,6 +157,7 @@ impl ExecutionStage {
task_statuses: vec![None; num_tasks],
output_link,
resolved,
+ stage_metrics: None,
Review Comment:
>
> Even aggregated we're adding more data to a value that has to be
read/written/decoded/encoded a lot
I agree with you that the `ExecutionGraph` is not the perfect place to store
metrics. But since there is no in-memory stat store now. And I do not want to
add another key space to sled/etcd to store the metrics, so just reuse the
existing `ExecutionGraph`. I think it is not that heavy compared to the
existing task_statuses array If we have thousands of tasks.
```
/// Status of each already scheduled task. If status is None, the partition
has not yet been scheduled
pub(crate) task_statuses: Vec<Option<task_status::Status>>,
```
Ideally those metrics should be stored in memory because currently they are
just for displaying purpose.
For long terms preserving, both the task status and metrics should be
persisted to other places.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]