ruoliu2 opened a new pull request, #11436:
URL: https://github.com/apache/incubator-gluten/pull/11436

   ## Summary
   - Add support for detecting and evaluating lakehouse table formats (Iceberg, 
Delta Lake, Hudi, Paimon) in the qualification tool
   - The tool previously only recognized raw file formats and reported negative 
improvements for lakehouse workloads
   - Uses multi-signal detection approach: provider class extraction > location 
pattern matching > node name patterns
   
   ## Changes
   - **LakehouseFormatDetector**: Central detector coordinating multiple 
detection strategies
   - **ProviderClassExtractor**: Extracts provider class from node description 
(most reliable)
   - **LocationPatternMatcher**: Fallback detection via location path patterns 
(`_delta_log`, `.hoodie`)
   - **NodeSupportVisitor**: Integrated lakehouse detection before raw format 
detection
   
   ## Key Design Decisions
   - Support is conditional based on detected format, NOT blanket operator 
support
   - BatchScanExec is NOT added to FULLY_SUPPORTED_OPERATORS list
   - Underlying file format (Parquet/ORC) determines actual support
   - Same schema type checks apply to lakehouse scans as raw formats
   
   ## Test plan
   - [x] 23 unit tests for LakehouseFormatDetector
   - [x] 16 integration tests for NodeSupportVisitor with lakehouse formats
   - [x] All 40 tests passing
   
   Closes #11417
   
   🤖 Generated with [Claude Code](https://claude.ai/code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to