thinkharderdev commented on code in PR #124:
URL: https://github.com/apache/arrow-ballista/pull/124#discussion_r945655343


##########
ballista/rust/scheduler/src/state/execution_graph.rs:
##########
@@ -153,6 +157,7 @@ impl ExecutionStage {
             task_statuses: vec![None; num_tasks],
             output_link,
             resolved,
+            stage_metrics: None,

Review Comment:
   Right, that helps but I still think that storing the metrics in the 
`ExecutionGraph` is not the right way to go. Even aggregated we're adding more 
data to a value that has to be read/written/decoded/encoded a lot. The other 
problem I think is that ideally we want to discard the full `ExecutionGraph` 
soon after the job is completed to prevent unbounded growth in the state store. 
But metrics may be something that we wish to preserve for longer (or even 
indefinitely). In addition, I think this may very well be an area where 
extensibility could be important. The `ExecutionGraph` is an internal 
implementation detail, but users may have an interest in storing and analyzing 
metrics and may wish to "export" them to a different system of their choosing 
(RDBMS, etc). Having an interface that allows for that would be helpful. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to