viirya commented on issue #121: URL: https://github.com/apache/arrow-datafusion-comet/issues/121#issuecomment-1981778664
> For the first part, it would be pretty straightforward to support in the Comet side as all the evaluations happens at the driver(/JVM) side. Hm? For DPP, the only difference is we don't need to broadcast the evaluation result. I'm not sure about "all evaluations happen at the driver side". I think the subplan is still needed to be executed on executors like `ScalarSubqueryExec` does. > Like we did for ScalarSubqueryExec, we can add a InSubquery PhysicalExpr implementation. The main problem is how to transform the list data from JVM to the native side. I'm skeptical to just transfer the java object array via the JNI call as the list might be pretty big. Maybe we should transform that to a RecordBatch/CometVector and then pass it back to the native side? I think we can have an initial version which does simple first by passing java objects through JNI call. Or, instead, we can keep the output of subplan in JVM, and we evaluate the JVM `InSet` expression through JNI call from native side. As `InSet` evaluation should be fast as it's only a hash table lookup. > Instead of implementing InSubquery, we can rewrite it with the InSet expression as we have already has the subquery list collected before we actually execute the plan. The problem is that: `InSubqueryExec` is a subquery expression which is different to `InSet`. I'm not sure how do you plan to rewrite it to an `InSet` but also keep the subquery plan. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
