andygrove commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050729883
########## docs/source/user-guide/compatibility.md: ########## @@ -40,19 +40,23 @@ Comet currently has three distinct implementations of the Parquet scan operator. | `native_datafusion` | This implementation delegates to DataFusion's `ParquetExec`. | | `native_iceberg_compat` | This implementation also delegates to DataFusion's `ParquetExec` but uses a hybrid approach of JVM and native code. This scan is designed to be integrated with Iceberg in the future. | -The new (and currently experimental) `native_datafusion` and `native_iceberg_compat` scans are being added to -provide the following benefits over the `native_comet` implementation: +The new `native_datafusion` and `native_iceberg_compat` scans provide the following benefits over the `native_comet` +implementation: -- Leverage the DataFusion community's ongoing improvements to `ParquetExec` -- Provide support for reading complex types (structs, arrays, and maps) -- Remove the use of reusable mutable-buffers in Comet, which is complex to maintain +- Leverages the DataFusion community's ongoing improvements to `ParquetExec` +- Provides support for reading complex types (structs, arrays, and maps) +- Removes the use of reusable mutable-buffers in Comet, which is complex to maintain +- Improved performance -These new implementations are not fully implemented. Some of the current limitations are: +The new scans have the following limitations: -- Scanning Parquet files containing unsigned 8 or 16-bit integers can produce results that don't match Spark. By default, Comet -will fall back to Spark when using these scan implementations to read Parquet files containing 8 or 16-bit integers. -This behavior can be disabled by setting `spark.comet.scan.allowIncompatible=true`. -- These implementations do not yet fully support timestamps, decimals, or complex types. Review Comment: @parthchandra Do we still have compatibility issues with decimals with the new scans? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org