advancedxy commented on issue #121:
URL: 
https://github.com/apache/arrow-datafusion-comet/issues/121#issuecomment-1982192147

   > I'm not sure about "all evaluations happen at the driver side"
   
   Emmm, I should be more specific. I meant all the `in` evaluations are 
happened at the driver side. The subplan/subquery is of course needed to be 
executed/prepared first.
   
   > I think we can have an initial version which does simple first by passing 
java objects through JNI call
   
   Hmm, this is always a valid option to go. I'm wondering whether it's simple 
enough to just convert the list of literal into an Arrow's ColumnVector since 
we are reuse all the infrastructure.
   
   > Or, instead, we can keep the output of subplan in JVM, and we evaluate the 
JVM InSet expression through JNI call from native side. As InSet evaluation 
should be fast as it's only a hash table lookup.
   
   I'm not sure about this approach, and never consider this as an option. The 
problem about evaluate the `InSet` expression in the JVM side is that now it 
requires that we do RecordBatch to InternalRow conversation in the native 
side(the `InSuqueryExec` requires an InternalRow to evaluate the child), and 
passing it back to JVM and get back the result. It generally defeats columnar 
execution?
   
   > I'm not sure how do you plan to rewrite it to an InSet but also keep the 
subquery plan.
   
   Currently, in 
`org.apache.spark.sql.comet.CometNativeExec#doExecuteColumnar`, instead simply 
copy the  serializedPlan message, we convert it back to operator message and 
transform the `InSubQuery` expr to `InSet` expr. There's no such utility to 
support that through, or ,maybe we are never going to go with that way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to