[PR] Cache common plan properties to eliminate recursive calls in physical plan [arrow-datafusion]

via GitHub Mon, 26 Feb 2024 07:09:49 -0800


mustafasrepo opened a new pull request, #9346:
URL: https://github.com/apache/arrow-datafusion/pull/9346

## Which issue does this PR close?

Closes #.

## Rationale for this change
In great analysis by @gruuya at the issue
[9084](https://github.com/apache/arrow-datafusion/issues/9084). @gruuya
recognized that stack usage (depth) increases a lot during logical and physical
planning. The root cause of aggressive stack usage is
- In the logical planning is excessive use of `.clone` of `LogicalPlan`
enum.
- In physical planning is the recursive function calls in the getter `API`s
of the `Arc<dyn ExecutionPlan>`, such as `EquivalenceProperties`,
`output_partitioning`, `output_ordering`, etc.

In the [PR9084](https://github.com/apache/arrow-datafusion/issues/9084),
@gruuya could reduce physical plan stack usage by caching
`equivalence_properties` for `ProjectionExec`.

This PR introduces a new struct to cache PlanProperties
(`PlanPropertiesCache`). With this struct, `schema`, `output_partitioning`,
`equivalence_properties`, `output_ordering` is cached. This caching mechanism
removes recursive calls during getter methods. Also, given `.cache` method is
implemented, default implementations of the `.output_partitioning`,
`.equivalence_properties`, `output_ordering` works out of the box.

With these changes stack depth decreases considerably, Since recursive
calls are eliminated in the `PhysicalPlan`.
As an example Flame graph for the query 54 is converted from following graph

![flamegraph_main_q54](https://github.com/synnada-ai/datafusion-upstream/assets/106137913/7ca67ebb-8153-479c-8c34-2f52a0040608)

to following graph

![flamegraph_branch_q54](https://github.com/synnada-ai/datafusion-upstream/assets/106137913/d4fe4b90-ecc5-47d1-b8a4-93be2c49384d).

## What changes are included in this PR?

## Are these changes tested?

Existing tests should work

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are
they covered by existing tests)?
-->

## Are there any user-facing changes?

`api change`

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] Cache common plan properties to eliminate recursive calls in physical plan [arrow-datafusion]

Reply via email to