matthewmturner opened a new issue, #10012:
URL: https://github.com/apache/arrow-datafusion/issues/10012
### Is your feature request related to a problem or challenge?
I've had to write a couple `ExecutionPlanVisitor`s recently (see below) and
when I started I initially looked for some documentation on this but wasn't
able to find any. I think it would be beneficial to new comers to see a couple
examples of `ExecutionPlanVisitor` in the docs.
### Describe the solution you'd like
A section in the docs with some example implementations of
`ExecutionPlanVisitor`
### Describe alternatives you've considered
_No response_
### Additional context
These were the `ExecutionPlanVisitor`s I made. I would be happy to add docs
around these.
```
struct FileScanVisitor {
file_scan_config: Option<FileScanConfig>,
}
impl ExecutionPlanVisitor for FileScanVisitor {
type Error = anyhow::Error;
fn pre_visit(&mut self, plan: &dyn ExecutionPlan) -> Result<bool,
Self::Error> {
let maybe_parquet_exec =
plan.as_any().downcast_ref::<ParquetExec>();
if let Some(parquet_exec) = maybe_parquet_exec {
self.file_scan_config =
Some(parquet_exec.base_config().clone());
}
Ok(true)
}
}
fn get_file_scan_config(plan: Arc<dyn ExecutionPlan>) ->
Option<FileScanConfig> {
let mut visitor = FileScanVisitor {
file_scan_config: None,
};
visit_execution_plan(plan.as_ref(), &mut visitor).unwrap();
visitor.file_scan_config
}
```
```
#[derive(Debug)]
struct ParquetVisitor;
impl ExecutionPlanVisitor for ParquetVisitor {
type Error = DataFusionError;
fn pre_visit(&mut self, plan: &dyn ExecutionPlan) -> Result<bool,
Self::Error> {
// Get the one-line representation of the ExecutionPlan, something
like this:
// ParquetExec: file_groups=[...], ...
let mut buf = String::new();
write!(&mut buf, "{}", displayable(plan).one_line()).map_err(|e| {
DataFusionError::Internal(format!("Error while collecting
metrics: {e}"))
})?;
// Trim everything up to the first colon.
// This is a hack to extract a human-readable representation of the
ExecutionPlan's type.
// We would prefer if `ExecutionPlan` had `name` method, but this
will do,
// since every physical operator seems to follow this convention.
// If a node doesn't, we just skip collecting its metrics, and no
harm is done.
let plan_type = match buf.split_once(':') {
None => {
println!("execution plan has unexpected display format:
{buf}");
return Ok(true);
}
Some((name, _)) => name.to_string(),
};
let maybe_parquet_exec = plan.as_any().downcast_ref::<ParquetExec>();
match maybe_parquet_exec {
Some(parquet_exec) => {
let metrics = match parquet_exec.metrics() {
None => return Ok(true),
Some(metrics) => metrics,
};
// println!("Metrics: {:?}", metrics);
let bytes_scanned = metrics.sum_by_name("bytes_scanned");
println!("Parquet Bytes scanned: {:?}", bytes_scanned);
}
None => {
}
}
Ok(true)
}
}
```
I had in mind having two examples, one for getting information from parquet
files (I could probably combine the two I had) and one that tracked data across
all nodes (maybe `output_rows`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]