[GitHub] [arrow-datafusion] waynexia commented on pull request #2648: WIP: Support logical plan compilation

GitBox Sun, 29 May 2022 22:22:05 -0700


waynexia commented on PR #2648:
URL: 
https://github.com/apache/arrow-datafusion/pull/2648#issuecomment-1140710977


   Hi @andygrove, thanks for your review.
   
   My proposal is to leverage JIT in the phase we convert logical plan into 
physical execution plan (in the `PhysicalPlanner`). And the compiled JIT plan 
is one kind of physical plan. This new flow looks like:
   ```
          Logical Plan
               │
      ┌────────▼──────────┐
      │ PhysicalPlanner   │
      │ (with JITContext) │
      └───┬────────────┬──┘
          │            │
          ▼            ▼
   JITExecPlan     other ExecutionPlan
   ```
   I suppose in this phase we have enough information on how to "physically" 
execute a SQL. JIT module should only replace some previous `ExecutionPlan` 
with their compiled variant. Like physical projection to JIT projection or 
physical HashJoin to JIT HashJoin.
   
   And in another perspective, I think it's not easy to compile a physical 
plan. One main reason is we can translate logical expr to JIT expr, but not 
physical expr to JIT expr. Logical expr is a sort of AST which is easy to 
compile, and physical expr is an "opaque" operation.
   
   However this is not infeasible at all. As showed in [this 
paper](http://www.vldb.org/pvldb/vol7/p853-klonatos.pdf). One of the idea is to 
let those operators used to operate on data to generate IR, and then use the IR 
to operate data. We could achieve something like
   ```rust
   // add a new method to this existing trait
   trait ExecutionPlan {
       fn jit_compile(&self, jit_ctx: JITContext) {
           // add the logic to JIT Context. E.g. a filter plan:
           jit_ctx.expr(
               // let column_c = column_a + column_b
               // if column_c > 0, materialize column_d to output
           )
       }
   }
   
   fn jit_exec(plan: ExecutionPlan, ctx: JITContext) {
       // let every physical plans to register their logic to context
       // and finalize these logic to executable program.
       ctx.compile(plan);
       ctx.execute()
   }
   ```
   I haven't considered and compared these two ways deeply. But in the current 
stage, I think they only differ on how to structure our implementation. But 
they may have an influence on the future topic, like minimizing memory 
footprint or optimizing (on our side) generated code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] waynexia commented on pull request #2648: WIP: Support logical plan compilation

Reply via email to