mbutrovich commented on issue #4611: URL: https://github.com/apache/datafusion-comet/issues/4611#issuecomment-4672188345
Thanks for the detailed report and the reproducer repo, @Neuw84! This is the same classloader issue as #3737, fixed by #4609. The `Could not validate FileScanTasks: org.apache.iceberg.ContentScanTask` message is a `ClassNotFoundException`: `validateIcebergFileScanTasks` was still using single-arg `Class.forName`, which uses Comet's classloader rather than the thread context classloader. Since you load Iceberg via `spark.jars.packages` (a child application classloader) while the Comet jar is on the system classpath, the lookup fails and the scan falls back to Spark, which is why your numbers match vanilla Spark. #4609 swaps those calls to use the thread context classloader, but it landed after 0.16.0 was tagged, so it isn't in the release you're on. Could you try a snapshot build to confirm it resolves your case? As an immediate workaround, putting the Iceberg jars on the same classpath as the Comet jar (e.g. both in Spark's `jars/` folder instead of via `spark.jars.packages`) should also avoid the fallback. Otherwise, 0.17 should be available by the end of next week (tracking in #4564) and will include the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
