rinchinov opened a new pull request, #19611:
URL: https://github.com/apache/druid/pull/19611

   ### Description
   
   Adds an **embedded end-to-end integration test** for the Delta Lake input 
source, as suggested by @abhishekrb19 in 
https://github.com/apache/druid/pull/19592#issuecomment-4746247174.
   
   It is modeled on the existing Iceberg embedded test 
(`IcebergRestCatalogIngestionTest`). Unlike Iceberg, Delta Lake reads directly 
from a local filesystem path, so no catalog or testcontainer is required — the 
test ingests a Delta table through a native `IndexTask` using 
`DeltaInputSource` and verifies the result over a real embedded Druid cluster 
(overlord, coordinator, indexer, broker, historical).
   
   #### What it covers
   
   The test reuses the regression table from #19592: **2 Parquet files × 2000 
rows = 4000 rows total**. Because each file exceeds the Delta kernel's default 
batch size of 1024 rows, this is the integration-level counterpart of the unit 
test `DeltaInputSourceTest.BatchDrainRegressionTests` for the per-file 
batch-drain bug (#18606):
   
   - **Without the fix:** `1024 × 2 = 2048` rows ingested
   - **With the fix:** `4000` rows ingested
   
   Assertions:
   - `COUNT(*)` = 4000 (exact; the core regression signal)
   - `MIN/MAX(__time)` bounded by the `id` column's documented min/max (0 and 
3999), confirming rows from both files were read
   
   #### Depends on #19592
   
   This test exercises the fix in #19592 and is **green only with that fix 
present**. On `master` (which still has the bug) it asserts 4000 but the input 
source returns 2048, so CI here will be red until #19592 is merged. Kept as a 
**draft** for that reason. The change is otherwise self-contained (test + 
copied Delta table resource + a test-scoped `druid-deltalake-extensions` 
dependency).
   
   #### Key changed/added classes in this PR
   - `DeltaLakeInputSourceIngestionTest` (new embedded e2e test)
   - `embedded-tests/pom.xml` (test-scoped `druid-deltalake-extensions` 
dependency)
   - `embedded-tests/src/test/resources/delta/large-row-group-table` (Delta 
table fixture)
   
   <hr>
   
   This PR has:
   
   - [x] been self-reviewed.
   - [x] added unit tests or modified existing tests to cover new code paths.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to