berkaysynnada commented on PR #8891: URL: https://github.com/apache/arrow-datafusion/pull/8891#issuecomment-1932036893
# TreeNode Design Proposal We have refactored many parts in TreeNode API in Datafusion to serve more understandable and easy to use tool. Now, our focus is on these items: 1) Recursion control variants: There are 2 types of recursion control enumerations, `RewriteRecursion` and `VisitRecursion`. `VisitRecursion` has 3 variants; `Continue`, `Skip`, `Stop`, and `RewriteRecursion` has one more; `mutate`. Especially `Skip` behavior is used with a few different logic, and the user would have difficulty what to expect. 2) Visitor and Rewriter API’s can replicate what the other one does. I believe there is no reason to have these 2, so they can be unified. 3) Transform functions do not have a recursion control logic. It will be very useful for many existing rules and future ones. Stopping the traverse or skipping the subtree are some used behaviors. What this document proposes is mainly focused on these problems. The design details will be explained with reference to the tree below: <img src="https://github.com/apache/arrow-datafusion/assets/124376117/ad8a2c77-c551-4f7b-bfa7-37418bf633ed" width="35%" height="auto"> <pre style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><code class="language-rust">pub struct Transformed<T> { pub data: T, pub transformed: bool, pub tnr: TreeNodeRecursion, } pub enum TreeNodeRecursion { Continue, Jump, Stop, } </code></pre><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">With the new<span class="Apple-converted-space"> </span><code>Transformed</code><span class="Apple-converted-space"> </span>struct, we can store the results of the operation results for each node. That will provide us more wise traversals. Previous<span class="Apple-converted-space"> </span><code>Skip</code><span class="Apple-converted-space"> </span>variant is somehow confusing, and have different meanings in different context. In the new suggested desing, it is replaced with<span class="Apple-converted-space"> </span><code>Jump</code>. During top-down traversals, it means that the current subtree will be skipped, and if there exists ano ther child subtree, the traversal is continued from there. If there does not, it means the same with the<span class="Apple-converted-space"> </span><code>Stop</code>, halts the traversal and collect all transformation till that node. During bottom-up traversals,<span class="Apple-converted-space"> </span><code>Jump</code><span class="Apple-converted-space"> </span>makes the walk on the current subtree skipped till the first multi-child node.</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><code>Skip</code><span class="Apple-converted-space"> </span>is interpreted in many other libraries as “the current node” is skipped, and the execution continues from the next node. However, our results are co llected after the rule works on the current node. Therefore; we cannot use<span class="Apple-converted-space"> </span><code>Skip</code><span class="Apple-converted-space"> </span>to skip self node. Instead, it could have a meaning skipping the next node. Rather than using<span class="Apple-converted-space"> </span><code>Skip</code><span class="Apple-converted-space"> </span>in that way, the rules can have their own handlings to skip that node with<span class="Apple-converted-space"> </span><code>Transformed::No</code><span class="Apple-converted-space"> </span>and<span class="Apple-converted-space"> </span><code>Recursion::Continue</code>. To not replicate the behavior, I suggest to remove<span class="Apple-converted-space"> </span><code>Skip</code>. Its meaning becomes also more complicated during post-order (bottom-up) traversals.</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: no rmal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">If is there anyone thinking that a Skip_Next_Node variant could be useful, please share your thoughts.</p><h1 style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Transform API</h1><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><strong>Pre-Order (Top-Down) Traversal :</strong><span class="Apple-converted-space"> </span>J → I → F → E → C → B → D → A → G → H</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><em>Example Scenario for a Top-Down Traversal:</em></p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">After op is applied on J; (op(J)):</p> Continue | Jump | Stop -- | -- | -- apply op(I) | terminate with Transformed::Yes if op(J) is Transformed::Yes | terminate with Transformed::Yes if op(J) is Transformed::Yes <p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">The new transform API as @peter-toth proposed,</p><pre style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><code class="language-rust">fn transform<FD, FU>( self, f_down: &mut FD, f_up: &mut FU, ) -> Result<Transformed<Self>> where FD: FnMut(Self) -> Result<Transformed<Self>>, FU: FnMut(Self) -> Result<Transformed<Self>>, </code></pre><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">we now do not need such an implementation. The same functionality is provided via Visitor / Rewriter API in a more simple way. Its details are as follows:</p><h1 style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Visitor / Rewriter API</h1><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orpha ns: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Visitor (<code>visit()</code>) and Rewriter (<code>rewrite()</code>) are doing similar things, and as @peter-toth have mentioned, their behavior under recursion control variants differ in a complicated way. I believe we should unify these API’s into a single one without losing functionality.</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><strong>Pre-Visit + Post-Visit:</strong></p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal ; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Pre-Visit(J) → Pre-Visit(I) → Pre-Visit(F) → Pre-Visit(E) → Pre-Visit(C) →</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Pre-Visit(B) → Post-Visit(B) → Pre-Visit(D) → Pre-Visit(A) → Post-Visit(A) →</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Post-Visit(D) → Post-Visit(C) → Post-Visit(E) → Pre-Visit(G) → Pre-Visit(H) →</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Post-Visit(H) → Post-Visit(G) → Post-Visit(F) → Post-Visit(I) → Post-Visit(J)</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">All<span class="Apple-converted-space"> </span><code>Continue</code>and<span class="Apple-converted-space"> </span><code>Sto p</code><span class="Apple-converted-space"> </span>perform the same functionality here. If<span class="Apple-converted-space"> </span><code>Jump</code><span class="Apple-converted-space"> </span>is resulted from a<span class="Apple-converted-space"> </span><code>pre_visit()</code><span class="Apple-converted-space"> </span>operation, traversal is skipped after that node, and returns back from that node with<span class="Apple-converted-space"> </span><code>post_visit()</code>. If it is from a<span class="Apple-converted-space"> </span><code>post_visit()</code><span class="Apple-converted-space"> </span>operation, it does the same in<span class="Apple-converted-space"> </span><code>transform_up()</code>, makes the walk on the current subtree skipped till the first multi-child node.</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Users can implement<span class="Apple-converted-space"> </span><span class="Apple-converted-space"> </span><code>pre_visit()</code><span class="Apple-converted-space"> </span>and<span class="Apple-converted-space"> </span><code>post_visit()</code><span class="Apple-converted-space"> </span>methods. If not implemented, they simply returns the self with “Continue” and “Transformed::No”.</p><p style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><em>Example Scenario for Visitor :</em></p> <img src="https://github.com/apache/arrow-datafusion/assets/124376117/eb0361e3-4bd1-48c1-82cd-6b32dae7b846" width="35%" height="auto"> <img src="https://github.com/apache/arrow-datafusion/assets/124376117/0d32911f-26ed-4779-b068-cc7ed4b29bcc" width="35%" height="auto"> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
