slfan1989 opened a new pull request, #2031:
URL: https://github.com/apache/auron/pull/2031

   ### Which issue does this PR close?
   
   Closes #2030
   
   ### Rationale for this change
   
   This PR adds native scan support for Hudi Copy-On-Write (COW) tables, 
enabling Auron to accelerate Hudi table reads by converting 
`FileSourceScanExec` operations to native Parquet/ORC scan implementations. 
   
   ### What changes are included in this PR?
   
   #### 1. **New Module: `thirdparty/auron-hudi`**
   - **`HudiConvertProvider`**: Implements `AuronConvertProvider` SPI to 
intercept and convert Hudi `FileSourceScanExec` to native scans
     - Detects Hudi file formats (`HoodieParquetFileFormat`, 
`HoodieOrcFileFormat`)
     - Converts to `NativeParquetScanExec` or `NativeOrcScanExec`
     - Handles timestamp fallback logic automatically
   
   - **`HudiScanSupport`**: Core detection and validation logic
     - File format recognition with `NewHoodie*` format rejection
     - Table type resolution via multi-source metadata fallback:
       - Options → Catalog → `.hoodie/hoodie.properties`
     - MOR table detection and rejection
     - Time travel query detection (via `as.of.instant`, `as.of.timestamp` 
options)
     - FileIndex class hierarchy verification
   
   #### 2. **Configuration**
   - Added `spark.auron.enable.hudi.scan` config option (default: `true`)
   - Respects existing Parquet/ORC timestamp scanning configurations
   - Runtime Spark version validation (3.0–3.5 only)
   
   #### 3. **Build & Integration**
   - **Maven**: New profile `hudi-0.15` with enforcer rules
     - Validates `hudiEnabled=true` property
     - Restricts Spark to 3.0–3.5
     - Pins Hudi version to 0.15.0
   
   - **Build Script**: Enhanced `auron-build.sh`
     - Added `--hudi <VERSION>` parameter
     - Version compatibility validation
     - Auto-enables `hudiEnabled` property
   
   - **CI/CD**: New workflow `.github/workflows/hudi.yml`
     - Matrix testing: Spark 3.0–3.5 × JDK 8/17/21 × Scala 2.12
     - Independent Hudi test pipeline
   
   ### Are there any user-facing changes?
   
   ## New Configuration Option
   
   ```scala
   // Enable Hudi native scan (enabled by default)
   spark.conf.set("spark.auron.enable.hudi.scan", "true")
   ```
   
   ### How was this patch tested?
   
   Add Junit Test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to