pgwhalen commented on code in PR #21768: URL: https://github.com/apache/datafusion/pull/21768#discussion_r3318615780
########## docs/source/user-guide/explain-usage.md: ########## @@ -240,6 +240,46 @@ When predicate pushdown is enabled, `DataSourceExec` with `ParquetSource` gains - `row_pushdown_eval_time`: time spent evaluating row-level filters - `page_index_eval_time`: time required to evaluate the page index filters +## Postgres-style `EXPLAIN (...)` options + +In addition to the legacy keyword form (`EXPLAIN ANALYZE VERBOSE FORMAT tree SELECT ...`), +DataFusion accepts a Postgres-style option list on dialects whose +[`supports_explain_with_utility_options`](https://docs.rs/sqlparser/latest/sqlparser/dialect/trait.Dialect.html#method.supports_explain_with_utility_options) +returns `true`. This includes the default `GenericDialect`, `PostgreSqlDialect`, and +`DuckDbDialect`, among others. + +```sql +EXPLAIN (ANALYZE, VERBOSE, METRICS 'rows,bytes', LEVEL dev) +SELECT ... ; +``` + +The recognized options are: + +| Option | Argument | Effect | +| --------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| `ANALYZE` | boolean, optional | Execute the plan and collect metrics. Defaults to `TRUE` when bare. Equivalent to the `ANALYZE` keyword. | +| `VERBOSE` | boolean, optional | Show per-partition metrics and additional detail. Equivalent to the `VERBOSE` keyword. | +| `FORMAT` | identifier/string | One of `indent`, `tree`, `pgjson`, `graphviz`. Equivalent to the `FORMAT <format>` clause. | +| `METRICS` | string | Filter `ANALYZE` metrics by category. Accepts `'all'`, `'none'`, or any comma-separated subset of `rows,bytes,timing,uncategorized`. | +| `LEVEL` | identifier/string | `summary` or `dev`. Controls metric verbosity for `ANALYZE`. | Review Comment: Overall I think this is an excellent translation of the Postgres EXPLAIN semantics to datafusion. One thought: when I see `dev` in a context like this, some of my reactions are: - Should I only be using this in a development environment? - Is it so costly that turning it on in prod could be problematic? - Is the data useful only useful to developers? It seems like the answer to those are most no. If I had to choose another word I would probably choose `detailed` which I don't believe is too overloaded in this context. Not sure what else would make sense. Super minor though, I think there are good arguments for `dev`, just wanted to share my thoughts, and I do recognize that `MetricType::Dev` has already been merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
