slfan1989 opened a new issue, #2015:
URL: https://github.com/apache/auron/issues/2015

   ### Overview
   
   This PR introduces native scan support for Apache Iceberg Copy-On-Write 
(COW) tables in Auron engine, enabling Auron to directly read Iceberg data 
files and accelerate query performance through the native execution engine.
   
   ### Design
   
   #### Architecture Overview 
   The implementation adopts the SPI (Service Provider Interface) extension 
mechanism with three core components:
   ```
   Spark Scan → Detect → Validate → Convert → Native Execute
                 (SPI)    (Support)  (Exec)      (JNI)
   ```
   
   #### Core Modules
   
   - IcebergConvertProvider
      - Implements `AuronConvertProvider` SPI interface
      - Auto-registered via `META-INF/services`
      - Checks Spark version compatibility (supports 3.4-4.0)
      - Provides configuration toggle: `spark.auron.enable.iceberg.scan`
   
   - IcebergScanSupport
      - Determines if the scan is from Iceberg data source (class name check)
      - Uses reflection to access Iceberg's internal `SparkInputPartition` and 
`FileScanTask`
      - Performs multiple checks to determine native scan eligibility:
         -  Only supports COW tables (no delete files)
         -  Does not support metadata columns (`_file`, `_pos`, etc.)
         -  Only supports Parquet and ORC formats
         - Does not support residual filters (row-level filtering)
         - Does not support mixed file formats
         - Only supports Auron-compatible data types
   
   - NativeIcebergTableScanExec
      - Extends `LeafExecNode` and `NativeSupports`
      - Converts Iceberg `FileScanTask` to Spark `FilePartition`
      - Generates Protobuf scan plans (`ParquetScanExecNode` or 
`OrcScanExecNode`)
      - Registers Hadoop FileSystem resources via JniBridge
      - Implements projection pushdown
      - Handles file splitting and coalescing for partitioned tables
   
   #### Supported Features
   - Currently Supported:
      - Full table scan on Iceberg COW tables
      - Parquet and ORC file formats
      - Projection pushdown (column pruning)
      - Partitioned table queries (partition filtering handled at Iceberg layer)
      - Empty table handling
      -  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to