advancedxy commented on issue #121: URL: https://github.com/apache/arrow-datafusion-comet/issues/121#issuecomment-1982192147
> I'm not sure about "all evaluations happen at the driver side" Emmm, I should be more specific. I meant all the `in` evaluations are happened at the driver side. The subplan/subquery is of course needed to be executed/prepared first. > I think we can have an initial version which does simple first by passing java objects through JNI call Hmm, this is always a valid option to go. I'm wondering whether it's simple enough to just convert the list of literal into an Arrow's ColumnVector since we are reuse all the infrastructure. > Or, instead, we can keep the output of subplan in JVM, and we evaluate the JVM InSet expression through JNI call from native side. As InSet evaluation should be fast as it's only a hash table lookup. I'm not sure about this approach, and never consider this as an option. The problem about evaluate the `InSet` expression in the JVM side is that now it requires that we do RecordBatch to InternalRow conversation in the native side(the `InSuqueryExec` requires an InternalRow to evaluate the child), and passing it back to JVM and get back the result. It generally defeats columnar execution? > I'm not sure how do you plan to rewrite it to an InSet but also keep the subquery plan. Currently, in `org.apache.spark.sql.comet.CometNativeExec#doExecuteColumnar`, instead simply copy the serializedPlan message, we convert it back to operator message and transform the `InSubQuery` expr to `InSet` expr. There's no such utility to support that through, or ,maybe we are never going to go with that way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
