andygrove commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2917789948
> I still think there is a bug here: > > For this test (when running on main): > > ```scala > test("debug datafusion native filter") { > val schema = StructType( > Seq( > StructField("row_idx", IntegerType, nullable = false), > StructField("int", IntegerType, nullable = false))) > > val data = DataGenerator.DEFAULT.generateRows(1000, schema) > > withSQLConf( > CometConf.COMET_EXPLAIN_VERBOSE_ENABLED.key -> "true", > CometConf.COMET_EXPLAIN_NATIVE_ENABLED.key -> "true", > CometConf.COMET_SPARK_TO_ARROW_SUPPORTED_OPERATOR_LIST.key -> "RDDScan") { > val df = spark > .createDataFrame(spark.sparkContext.parallelize(data, 1), schema) > .where(col("row_idx") < 10000 || col("row_idx") > 10010) > > df.explain(true) > df > .show() > } > } > ``` > > The spark plan is: > > ``` > == Parsed Logical Plan == > 'Filter (('row_idx < 10000) OR ('row_idx > 10010)) > +- LogicalRDD [row_idx#2, int#3], false > > == Analyzed Logical Plan == > row_idx: int, int: int > Filter ((row_idx#2 < 10000) OR (row_idx#2 > 10010)) > +- LogicalRDD [row_idx#2, int#3], false > > == Optimized Logical Plan == > Filter ((row_idx#2 < 10000) OR (row_idx#2 > 10010)) > +- LogicalRDD [row_idx#2, int#3], false > > == Physical Plan == > *(2) CometColumnarToRow > +- CometFilter [row_idx#2, int#3], ((row_idx#2 < 10000) OR (row_idx#2 > 10010)) > +- CometSparkRowToColumnar > +- *(1) Scan ExistingRDD[row_idx#2,int#3] > ``` > > and the datafusion plan is: > > ``` > 25/05/28 19:17:14 INFO core/src/execution/jni_api.rs: Comet native query plan: > FilterExec: col_0@0 < 10000 OR col_0@0 > 10010 > ScanExec: source=[CometSparkRowToColumnar (unknown)], schema=[col_0: Int32, col_1: Int32] > ``` > > It is using DataFusion Filter and not CometFilter while it should use comet filter as there is reuse, no? In this example, Spark (not Comet) is performing the scan. Comet is then performing the row-to-columnar conversion. The `native_comet` scan is not being used so there is no need to use Comet's filter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org