Neuw84 opened a new issue, #4611:
URL: https://github.com/apache/datafusion-comet/issues/4611

   ### Describe the bug
   
   OSS Spark 4.0.1 Comet 0.16, Iceberg 1.10
   
   Seems that if I read Iceberg data from Glue catalog there are some issues. 
The performance numbers that I get are almost exactly the same than using 
vanilla Spark.
   
   ```
   26/06/09 12:42:01 WARN CometExecRule: Comet cannot execute some parts of 
this plan natively (set spark.comet.explainFallback.enabled=false to disable 
this logging):
   CometSort
   +- CometSinkPlaceHolder
      +- CometColumnarExchange
         +- ObjectHashAggregate
            +- CometSinkPlaceHolder
               +- CometColumnarExchange
                  +- ObjectHashAggregate
                     +-  BatchScan glue.comet-tests.src_fact [COMET: Iceberg 
reflection failure: Could not validate FileScanTasks: 
org.apache.iceberg.ContentScanTask]
    ```
   
   
   
   ### Steps to reproduce
   
   The code can be found on this repo, there are three notebooks.
   
   Parquet to Parquet
   Parquet to Iceberg
   Iceberg to Iceberg ( this should have similar performance than the previous 
one). 
   
   https://github.com/Neuw84/spark-comet-iceberg/tree/main
   
   
   
   ### Expected behavior
   
   Similar performance numbers than Parquet to Iceberg.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to