NGA-TRAN opened a new issue, #18251:
URL: https://github.com/apache/datafusion/issues/18251

   ### Is your feature request related to a problem or challenge?
   
   This task is part of feature 
https://github.com/apache/datafusion/issues/18249,  showing join order 
enumeration for Q5 using a Join Ranking, where higher ranks indicate earlier 
execution. It assumes ranks are precomputed and focuses solely on enumeration 
and pruning. Rank computation will be covered in a separate ticket.
   
   ## Enumeration Examples
   
   Figure 2 shows three joins ranked 4, one ranked 3, and two ranked 2. The 
three rank-4 joins are selected first in the join order, initiating three 
in-progress plans as illustrated in Figure 3. In these examples, we assume the 
join output remains sorted on the join key if the larger input is already 
sorted. This reflects the behavior of merge joins, which preserve sort order. 
Hash joins, on the other hand, do not generally guarantee output order, though 
some engines may preserve probe-side order in specific cases.
   
   <img width="326" height="340" alt="Image" 
src="https://github.com/user-attachments/assets/a0c5dc39-383f-46cb-8d34-8a9d17d7c1fa";
 />
   Figure 2: Join Graph Annotated with Join Ranked
   
   After a join is selected, remaining join ranks are recalculated using 
updated properties and may change from earlier values.
   
   <img width="1064" height="373" alt="Image" 
src="https://github.com/user-attachments/assets/efc3c4c6-9882-4312-b269-49deb059c146";
 />
   Figure 3: Level-1 Partial Plans
   
   In this enumeration scheme/examples, join type matters: if inputs are 
sorted, both merge and hash joins are considered; otherwise, only hash joins 
are used. As a result, the three plans above expand into five plans shown in 
Figure 4.
   
   <img width="1344" height="385" alt="Image" 
src="https://github.com/user-attachments/assets/0ab1a5bc-5140-4dfa-be07-b5aa1e6beac2";
 />
   Figure 4: Level-1 Partial Plans with Join Type Selection
   
   To avoid plan space explosion during enumeration, we retain at most two 
plans after each step. A pruning strategy is applied to discard less promising 
options. We'll cover pruning later; for now, we assume it eliminates plans 
(1.2) and (2), as shown in Figure 5.
   
   <img width="1360" height="381" alt="Image" 
src="https://github.com/user-attachments/assets/002cee6a-a4de-4442-8f2a-b99da7e5f668";
 />
   Figure 5: Level-1 Partial Plans After Join Type Selection and Pruning
   
   Similarly, the remaining level-1 partial plans (1.1) and (3) generate three 
level-2 plans, as shown in Figure 6.
   
   <img width="1064" height="362" alt="Image" 
src="https://github.com/user-attachments/assets/993e1888-aff8-4af2-aa47-1cfb42b52737";
 />
   Figure 6: Level-2 Partial Plans
   
   Then plan (3.1) is pruned because it shares the same structure as plan 
(1.1.1).
   
   <img width="1062" height="376" alt="Image" 
src="https://github.com/user-attachments/assets/759b530c-3eff-4dfb-b100-483255d7bc64";
 />
   Figure 7: Level-2 Partial Plans After Pruning
   
   Continuing, level-2 plans (1.1.1) and (3.2) each contain a single 
highest-ranked join (rank 3), resulting in two level-3 plans shown in Figure 8.
   
   <img width="538" height="212" alt="Image" 
src="https://github.com/user-attachments/assets/687dc456-05d9-4f33-a3e8-7d228a5300bd";
 />
   Figure 8: Level-3 Partial Plans
   
   Continuing, Figures 9 and 10 show the level-4 plans before and after 
pruning, respectively.
   
   <img width="1184" height="208" alt="Image" 
src="https://github.com/user-attachments/assets/261093cf-e2a1-4b0d-a662-a69fdd9dfbe8";
 />
   Figure 9: Level-4 Partial Plans
   
   <img width="1202" height="191" alt="Image" 
src="https://github.com/user-attachments/assets/19f50e00-12ab-4e38-97f5-4b079328a575";
 />
   Figure10: Level-4 Partial Plans after Pruning
   
   Finally, Figure 11 shows the two final-level plans, with plan (3.2.1.1.1) 
selected as the best.
   
   <img width="460" height="204" alt="Image" 
src="https://github.com/user-attachments/assets/2243e007-2eaa-4e36-8ee8-da3c08850dfd";
 />
   Figure11: Final-level Plans
   
   ## Enumeration Summary
   
   As shown, the enumeration process depends on three customizable components:
    - A join ranking strategy, which can vary and be tailored as needed
    - A pruning mechanism, which can also be adapted
    - An option to include or exclude join types (i.e., physical plans) during 
enumeration
   
   This suggests we can design a flexible enumeration framework that avoids 
relying on fixed join rankings or pruning strategies.
   
   
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to