mingmwang commented on code in PR #124:
URL: https://github.com/apache/arrow-ballista/pull/124#discussion_r945671491


##########
ballista/rust/scheduler/src/state/execution_graph.rs:
##########
@@ -153,6 +157,7 @@ impl ExecutionStage {
             task_statuses: vec![None; num_tasks],
             output_link,
             resolved,
+            stage_metrics: None,

Review Comment:
   > 
   > Even aggregated we're adding more data to a value that has to be 
read/written/decoded/encoded a lot
   
   I agree with you that the `ExecutionGraph` is not the perfect place to store 
metrics. But since there is no in-memory stat store now. And I do not want to 
add another key space to sled/etcd to store the metrics, so just reuse the 
existing `ExecutionGraph`.  I think it is not that heavy compared to the 
existing task_statuses array If we have thousands of tasks.  
   
   ```
   /// Status of each already scheduled task. If status is None, the partition 
has not yet been scheduled
    pub(crate) task_statuses: Vec<Option<task_status::Status>>,
   ```
   
   Ideally those metrics should be stored in memory because currently they are 
just for displaying purpose.
   For long terms preserving, both the task status and metrics should be 
persisted to other places.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to