viirya commented on issue #121:
URL: 
https://github.com/apache/arrow-datafusion-comet/issues/121#issuecomment-1981778664

   > For the first part, it would be pretty straightforward to support in the 
Comet side as all the evaluations happens at the driver(/JVM) side. 
   
   Hm? For DPP, the only difference is we don't need to broadcast the 
evaluation result. I'm not sure about "all evaluations happen at the driver 
side". I think the subplan is still needed to be executed on executors like 
`ScalarSubqueryExec` does.
   
   > Like we did for ScalarSubqueryExec, we can add a InSubquery PhysicalExpr 
implementation. The main problem is how to transform the list data from JVM to 
the native side. I'm skeptical to just transfer the java object array via the 
JNI call as the list might be pretty big. Maybe we should transform that to a 
RecordBatch/CometVector and then pass it back to the native side?
   
   I think we can have an initial version which does simple first by passing 
java objects through JNI call. Or, instead, we can keep the output of subplan 
in JVM, and we evaluate the JVM `InSet` expression through JNI call from native 
side. As `InSet` evaluation should be fast as it's only a hash table lookup.
   
   > Instead of implementing InSubquery, we can rewrite it with the InSet 
expression as we have already has the subquery list collected before we 
actually execute the plan. The problem is that:
   
   `InSubqueryExec` is a subquery expression which is different to `InSet`. I'm 
not sure how do you plan to rewrite it to an `InSet` but also keep the subquery 
plan.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to