sgrebnov commented on PR #13267:
URL: https://github.com/apache/datafusion/pull/13267#issuecomment-2465963391

   @findepi - sorry, my bad – I should have provided more context
   
   We use Datafusion with [DataFusion 
Federation](https://github.com/spiceai/datafusion-federation/) to convert user 
queries into a LogicalPlan, then detect which parts of the plan belong to 
external execution engines. These parts are converted to SQL (unparsed) and 
executed by remote execution engines as part of the overall query execution. 
For example, in the scenario below, parts of the LogicalPlan are executed using 
external engines (MySQL and PostgreSQL) via unparsing corresponding sub-plans, 
with final aggregations or joins processed by DataFusion. All of this happens 
as part of DataFusion’s execution logic. If there are multiple external engines 
involved, only parts of the main plan are converted (see example below), so 
when we don’t have optimized/pushed-down projections, we end up fetching all 
columns. With projections optimization we propagate required columns to child 
nodes so only required columns could be fetched. Thus, the goal is to have 
projection columns pruning 
 optimization enabled and to be able to unparse the logical plan back to SQL 
afterward. Please let me know if I should elaborate more on the challenges with 
the unparser after the optimization rule are applied.
   
   ```
                     ┌────────────────────────┐
                     │   Join / Aggregation   │               B and C are
                     └────────────────────────┘               available in an
                                  ▲                           external database
                                  │                           DBMS-2 (PostreSQL)
                                  │
   A is available in an           │                           Unparse -> SQL
   external database in           ├─────────────────────┐
   DBMS-1 (MySQL).                │┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
                     ┌────────────┘                     │
   Unparse -> SQL    │             │                    │                      │
                     │                          ┌───────┴──────┐
                     │             │            │     Join     │               │
       ┌ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┐              └───────▲──────┘
                     │             │                    │                      │
       │             │           │            ┌─────────┴──────────┐
            ┌────────┴───────┐     │          │                    │           │
       │    │     Scan A     │   │            │                    │
            └────────────────┘     │ ┌────────────────┐   ┌────────────────┐   │
       │                         │   │     Scan B     │   │     Scan C     │
                                   │ └────────────────┘   └────────────────┘   │
       │                         │
        ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─  └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to