morningman opened a new pull request, #61602:
URL: https://github.com/apache/doris/pull/61602

   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Problem Summary:
   Add Delta Lake as a new external data source in Doris. This enables users to 
create Delta Lake catalogs backed by HMS and query Delta Lake tables directly 
from Doris, including support for Deletion Vectors (DV) and predicate pushdown 
for data skipping.
   
   **Key changes:**
   - **FE Framework**: New `DeltaLakeExternalCatalog`, 
`DeltaLakeExternalDatabase`, `DeltaLakeExternalTable`, `DeltaLakeMetadataOps` 
classes with HMS integration
   - **FE Query Execution**: `DeltaLakeScanNode` using Delta Kernel API for 
file enumeration, `DeltaLakePredicateConverter` for predicate pushdown, 
`DeletionVectorDescriptorInfo` for DV metadata
   - **BE Reader**: `DeltaLakeParquetReader` that wraps `ParquetReader` and 
applies Deletion Vector filtering by reusing existing `DeletionVectorReader` 
infrastructure (same Puffin/Roaring64Map format as Iceberg V3)
   - **Thrift**: `TDeltaLakeDeletionVectorDesc` and `TDeltaLakeFileDesc` 
structs for FE→BE DV info transfer
   - **Dependencies**: `delta-kernel-api:3.3.0` and 
`delta-kernel-defaults:3.3.0` Maven dependencies
   
   ### Release note
   
   Support Delta Lake as an external catalog. Users can create a Delta Lake 
catalog via:
   
   ```sql
   CREATE CATALOG delta_ctl PROPERTIES (
       'type' = 'deltalake',
       'hive.metastore.uris' = 'thrift://host:port'
   );
   ```
   
   And query Delta Lake tables with full SQL support including predicate 
pushdown and Deletion Vector filtering.
   
   ### Check List (For Author)
   
   - Test: Regression test / Unit Test
       - 
`regression-test/suites/external_table_p0/deltalake/test_deltalake_catalog.groovy`
       - 
`regression-test/suites/external_table_p0/deltalake/test_deltalake_query.groovy`
       - 
`fe/fe-core/src/test/java/org/apache/doris/datasource/deltalake/DeltaLakePredicateConverterTest.java`
   - Behavior changed: No
   - Does this need documentation: Yes (to be added separately)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to