thinkharderdev commented on issue #116:
URL: https://github.com/apache/arrow-ballista/issues/116#issuecomment-1206311766
In our project we've used the `ExecutorMetricsCollector` for this purpose
and it seems to work relatively well. There is some additional plumbing needed
since you have to have an rpc service on the scheduler side which can collect
the metrics (and store them somewhere).
But basically what it looks like is you have something like:
```
struct MetricsPublisher {
rx: Receiver<StageCompletion>,
metrics_client: QueryManagerServiceClient<tonic::transport::Channel>,
}
```
which will listen for completion events and call the scheduler rpc in the
background to publish them for aggregation
and then something like
```
pub struct ExecutionMetricsCollector {
prometheus_metrics: PrometheusMetrics,
sender: Sender<StageCompletion>,
}
impl ExecutorMetricsCollector for ExecutionMetricsCollector {
fn record_stage(
&self,
job_id: &str,
stage_id: usize,
partition: usize,
plan: ShuffleWriterExec,
) {
self.sender
.try_send(StageCompletion {
partition_id: partition,
shuffle_write: plan,
})
.unwrap_or_else(|err| {
warn!(
job_id,
stage_id, partition, "error sending stage completion to
channel: {:?}", err
)
});
}
}
```
Then of course you need to serialize the `ExecutionMetricsSet` to a protobuf
struct but that is relatively straightforward.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]