waynexia commented on PR #2648:
URL:
https://github.com/apache/arrow-datafusion/pull/2648#issuecomment-1140710977
Hi @andygrove, thanks for your review.
My proposal is to leverage JIT in the phase we convert logical plan into
physical execution plan (in the `PhysicalPlanner`). And the compiled JIT plan
is one kind of physical plan. This new flow looks like:
```
Logical Plan
│
┌────────▼──────────┐
│ PhysicalPlanner │
│ (with JITContext) │
└───┬────────────┬──┘
│ │
▼ ▼
JITExecPlan other ExecutionPlan
```
I suppose in this phase we have enough information on how to "physically"
execute a SQL. JIT module should only replace some previous `ExecutionPlan`
with their compiled variant. Like physical projection to JIT projection or
physical HashJoin to JIT HashJoin.
And in another perspective, I think it's not easy to compile a physical
plan. One main reason is we can translate logical expr to JIT expr, but not
physical expr to JIT expr. Logical expr is a sort of AST which is easy to
compile, and physical expr is an "opaque" operation.
However this is not infeasible at all. As showed in [this
paper](http://www.vldb.org/pvldb/vol7/p853-klonatos.pdf). One of the idea is to
let those operators used to operate on data to generate IR, and then use the IR
to operate data. We could achieve something like
```rust
// add a new method to this existing trait
trait ExecutionPlan {
fn jit_compile(&self, jit_ctx: JITContext) {
// add the logic to JIT Context. E.g. a filter plan:
jit_ctx.expr(
// let column_c = column_a + column_b
// if column_c > 0, materialize column_d to output
)
}
}
fn jit_exec(plan: ExecutionPlan, ctx: JITContext) {
// let every physical plans to register their logic to context
// and finalize these logic to executable program.
ctx.compile(plan);
ctx.execute()
}
```
I haven't considered and compared these two ways deeply. But in the current
stage, I think they only differ on how to structure our implementation. But
they may have an influence on the future topic, like minimizing memory
footprint or optimizing (on our side) generated code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]