sandugood opened a new issue, #4723:
URL: https://github.com/apache/datafusion-comet/issues/4723

   ### Describe the bug
   
   Was testing out Comet in a multi-stage ETL process (regarding multiple joins 
and native Iceberg scan). Some of the steps look like a usual Comet execution. 
   ```
   CometSinkPlaceHolder
   +- CometColumnarExchange
      +- Project
         +- SortMergeJoin
            :- CometSort
            :  +- CometSinkPlaceHolder
            :     +- CometColumnarExchange
            :        +- Project
            :           +- SortMergeJoin
            :              :- CometSort
            :              :  +- CometSinkPlaceHolder
            :              :     +- CometColumnarExchange
            :              :        +- Project
            :              :           +- SortMergeJoin
            :              :              :- CometSort
            :              :              :  +- CometSinkPlaceHolder
            :              :              :     +- CometColumnarExchange
            :              :              :        +- Project
            :              :              :           +- SortMergeJoin
            :              :              :              :- CometSort
            :              :              :              :  +- 
CometSinkPlaceHolder
            :              :              :              :     +- 
CometColumnarExchange
            :              :              :              :        +- Project
            :              :              :              :           +- 
SortMergeJoin
            :              :              :              :              :- Sort
            :              :              :              :              :  +-  
HashAggregate [COMET: Unsupported data type: TimestampNTZType, Unsupported 
aggregate expression(s)]
            :              :              :              :              :     
+- CometSinkPlaceHolder
            :              :              :              :              :       
 +- CometColumnarExchange
            :              :              :              :              :       
    +- HashAggregate
            :              :              :              :              :       
       +-  InMemoryTableScan [COMET: InMemoryTableScan is not supported]
            :              :              :              :              :       
             +- InMemoryRelation
            :              :              :              :              :       
                   +- CometExchange
            :              :              :              :              :       
                      +- CometProject
            :              :              :              :              :       
                         +- CometSortMergeJoin
            :              :              :              :              :       
                            :- CometSort
            :              :              :              :              :       
                            :  +- CometExchange
            :              :              :              :              :       
                            :     +- CometProject
            :              :              :              :              :       
                            :        +- CometFilter
            :              :              :              :              :       
                            :           +- CometIcebergNativeScan
            :              :              :              :              :       
                            +- CometSort
            :              :              :              :              :       
                               +- CometFilter
            :              :              :              :              :       
                                  +- CometHashAggregate
            :              :              :              :              :       
                                     +- CometExchange
            :              :              :              :              :       
                                        +- CometHashAggregate
            :              :              :              :              :       
                                           +- CometUnion
            :              :              :              :              :       
                                              :- CometProject
            :              :              :              :              :       
                                              :  +- CometFilter
            :              :              :              :              :       
                                              :     +- CometIcebergNativeScan
            :              :              :              :              :       
                                              +- CometIcebergNativeScan
   ```
   
   However when aggregating the final result (to test Comet vs Spark sanity) - 
getting bloated values for `sum` aggregation. We can even see that without the 
aggregation, just comparing single values for users (domain-specific) across 
both resulting tables.
   Note:
   - using `"spark.comet.scan.icebergNative.enabled": "true"`. So native 
Iceberg scan is enabled. And using it from both `iceberg-rust` and 
`iceberg-storage-opendal` repo `main` branch. Why? There was a fix regarding 
reading `.parquet` files that didn't contain page index. Before that - whole 
query would faild. Now it works, but produces inconsistent results. Check 
https://github.com/apache/iceberg-rust/pull/2693
   - bloated values are consistent and deterministic. So it isn't related to 
`spark.comet.exec.strictFloatingPoint` being set to true or false. Wrong 
results are being consistent.
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   Same (are almost same down to floating point precision) values for both 
Spark and Comet
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to