advancedxy commented on issue #121:
URL:
https://github.com/apache/arrow-datafusion-comet/issues/121#issuecomment-1981343136
I did some research to support `InSubqueryExec`. I think we should postpone
the support a little bit, at least after `Comet` supporting Join operators.
The `InSubqueryExec` is mainly used for
1. DPP(dynamic partition pruning), which evaluates the in predicate in the
driver side.
2. Some special cases, which actually performs the `inSet` evaluation in the
executor side(for Comet, the native side).
For the first part, it would be pretty straightforward to support in the
`Comet` side as all the evaluations happens at the driver(/JVM) side. We can
model that like `InSubqueryExec` to prepare subqueries first and do some
potential expression and plan transforms. We are good to go. However, DPP
applies to Join operators. It would be reasonable to add DPP support after we
have Join operators in Comet.
For the second part, it's slightly complicated. Per my understanding, we
have multiple options:
1. Like we did for `ScalarSubqueryExec`, we can add a `InSubquery`
PhysicalExpr implementation. The main problem is how to transform the list data
from JVM to the native side. I'm skeptical to just transfer the java object
array via the JNI call as the list might be pretty big. Maybe we should
transform that to a RecordBatch/CometVector and then pass it back to the native
side?
2. Instead of implementing `InSubquery`, we can rewrite it with the `InSet`
expression as we have already has the subquery list collected before we
actually execute the plan. The problem is that:
- Currently, we don't have a way to rewrite/transform the native
operator after we created it
- The proto message should have a size limit, something like 64MB? It
will not work for the huge inSet.
cc @viirya @sunchao appreciate if you guys have more insights about this
topic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]