sandugood commented on issue #4723: URL: https://github.com/apache/datafusion-comet/issues/4723#issuecomment-4799748849
Returning with additional info: 1. When tried to debug and rerun the pipeline, once got: `WARN CometIcebergNativeScan: Failed to serialize delete file: null` 2. Additionally, when performing a single `.count()` over data - Comet's row count is higher, than vanilla Spark. For context. Using spark-4.0.3 version (official image) + additional .jar files inside the image (all of them pulled from maven-central proxy): - compiled Comet (from `main` branch, with `iceberg-rust` crate pulled also from `main` branch) - `iceberg-spark-runtime-4.0_2.13-1.11.0.jar` - `iceberg-aws-bundle-1.11.0.jar` In-comparison - vanilla Spark has all of the .jar files listed, besides the Comet one. So, right now, I would say that these wrong values comes purely from reading with native Iceberg scan on. When I did enable Comet, but didn't enable native scan - it was the same as vanilla Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
