morningman opened a new pull request, #61602:
URL: https://github.com/apache/doris/pull/61602
### What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
Add Delta Lake as a new external data source in Doris. This enables users to
create Delta Lake catalogs backed by HMS and query Delta Lake tables directly
from Doris, including support for Deletion Vectors (DV) and predicate pushdown
for data skipping.
**Key changes:**
- **FE Framework**: New `DeltaLakeExternalCatalog`,
`DeltaLakeExternalDatabase`, `DeltaLakeExternalTable`, `DeltaLakeMetadataOps`
classes with HMS integration
- **FE Query Execution**: `DeltaLakeScanNode` using Delta Kernel API for
file enumeration, `DeltaLakePredicateConverter` for predicate pushdown,
`DeletionVectorDescriptorInfo` for DV metadata
- **BE Reader**: `DeltaLakeParquetReader` that wraps `ParquetReader` and
applies Deletion Vector filtering by reusing existing `DeletionVectorReader`
infrastructure (same Puffin/Roaring64Map format as Iceberg V3)
- **Thrift**: `TDeltaLakeDeletionVectorDesc` and `TDeltaLakeFileDesc`
structs for FE→BE DV info transfer
- **Dependencies**: `delta-kernel-api:3.3.0` and
`delta-kernel-defaults:3.3.0` Maven dependencies
### Release note
Support Delta Lake as an external catalog. Users can create a Delta Lake
catalog via:
```sql
CREATE CATALOG delta_ctl PROPERTIES (
'type' = 'deltalake',
'hive.metastore.uris' = 'thrift://host:port'
);
```
And query Delta Lake tables with full SQL support including predicate
pushdown and Deletion Vector filtering.
### Check List (For Author)
- Test: Regression test / Unit Test
-
`regression-test/suites/external_table_p0/deltalake/test_deltalake_catalog.groovy`
-
`regression-test/suites/external_table_p0/deltalake/test_deltalake_query.groovy`
-
`fe/fe-core/src/test/java/org/apache/doris/datasource/deltalake/DeltaLakePredicateConverterTest.java`
- Behavior changed: No
- Does this need documentation: Yes (to be added separately)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]