rinchinov opened a new pull request, #19611: URL: https://github.com/apache/druid/pull/19611
### Description Adds an **embedded end-to-end integration test** for the Delta Lake input source, as suggested by @abhishekrb19 in https://github.com/apache/druid/pull/19592#issuecomment-4746247174. It is modeled on the existing Iceberg embedded test (`IcebergRestCatalogIngestionTest`). Unlike Iceberg, Delta Lake reads directly from a local filesystem path, so no catalog or testcontainer is required — the test ingests a Delta table through a native `IndexTask` using `DeltaInputSource` and verifies the result over a real embedded Druid cluster (overlord, coordinator, indexer, broker, historical). #### What it covers The test reuses the regression table from #19592: **2 Parquet files × 2000 rows = 4000 rows total**. Because each file exceeds the Delta kernel's default batch size of 1024 rows, this is the integration-level counterpart of the unit test `DeltaInputSourceTest.BatchDrainRegressionTests` for the per-file batch-drain bug (#18606): - **Without the fix:** `1024 × 2 = 2048` rows ingested - **With the fix:** `4000` rows ingested Assertions: - `COUNT(*)` = 4000 (exact; the core regression signal) - `MIN/MAX(__time)` bounded by the `id` column's documented min/max (0 and 3999), confirming rows from both files were read #### Depends on #19592 This test exercises the fix in #19592 and is **green only with that fix present**. On `master` (which still has the bug) it asserts 4000 but the input source returns 2048, so CI here will be red until #19592 is merged. Kept as a **draft** for that reason. The change is otherwise self-contained (test + copied Delta table resource + a test-scoped `druid-deltalake-extensions` dependency). #### Key changed/added classes in this PR - `DeltaLakeInputSourceIngestionTest` (new embedded e2e test) - `embedded-tests/pom.xml` (test-scoped `druid-deltalake-extensions` dependency) - `embedded-tests/src/test/resources/delta/large-row-group-table` (Delta table fixture) <hr> This PR has: - [x] been self-reviewed. - [x] added unit tests or modified existing tests to cover new code paths. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
