Neuw84 opened a new issue, #4611:
URL: https://github.com/apache/datafusion-comet/issues/4611
### Describe the bug
OSS Spark 4.0.1 Comet 0.16, Iceberg 1.10
Seems that if I read Iceberg data from Glue catalog there are some issues.
The performance numbers that I get are almost exactly the same than using
vanilla Spark.
```
26/06/09 12:42:01 WARN CometExecRule: Comet cannot execute some parts of
this plan natively (set spark.comet.explainFallback.enabled=false to disable
this logging):
CometSort
+- CometSinkPlaceHolder
+- CometColumnarExchange
+- ObjectHashAggregate
+- CometSinkPlaceHolder
+- CometColumnarExchange
+- ObjectHashAggregate
+- BatchScan glue.comet-tests.src_fact [COMET: Iceberg
reflection failure: Could not validate FileScanTasks:
org.apache.iceberg.ContentScanTask]
```
### Steps to reproduce
The code can be found on this repo, there are three notebooks.
Parquet to Parquet
Parquet to Iceberg
Iceberg to Iceberg ( this should have similar performance than the previous
one).
https://github.com/Neuw84/spark-comet-iceberg/tree/main
### Expected behavior
Similar performance numbers than Parquet to Iceberg.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]