schenksj commented on PR #3932:
URL:
https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4238055079
> This is awesome @schenksj! Thank you!
>
> At 6,500 lines, I'd like to take some time to review this one in stages.
Without looking too closely at it yet, the first questions that come to mind
that I want to look at first:
>
> 1. Does Delta have a Spark test suite we can run, similar to what we do
with Iceberg's Java library to have confidence in the implementation's
correctness?
> 2. Are there significant downstream dependencies we pick up with this
change? We're already struggling a bit with `iceberg-rust` being locked to
specific DataFusion and Arrow-rs versions. Does this bring the same challenge,
and would Comet be limited from upgrading until _all_ dependencies are in sync?
>
> Like @andygrove, I am mostly concerned about the maintenance burden.
Though perhaps I am more concerned about future Comet changes than I am about
maintaining this new delta code. I am imagining future major Comet changes like
rewriting our rules to run later to be compatible with AQE improvements in
Spark 4.0+, and this delta integration becomes something we have to update or
leave behind. I don't think any of this should be disqualifying from a merge,
but it's another reason I want to sit with the PR for a bit. I'd like to try to
imagine ways we could be possibly boxed in by this code.
>
> Thank you again for the contribution! I am looking forward to digging into
it this week.
Happy to answer any questions you have. Fortunately, I think most of the
actual code is test cases.
1. Will look into the whether there is a delta spark acceptance suite like
the Iceberg one you mentioned
2. Dependency creep... This is using semver-based crate identity, so both
versions of arrow co-exist. I suspect that given that we're not integrating
FFI from delta-rs, that we might be able to force use of the newer arrow
without issue. Will look into it. If not, the tradeoff is binary size but
not symbol conflicts (thank you rust).
## Downstream Dependencies Added by This PR
### Direct Dependencies (Cargo.toml)
| Crate | Version | Purpose |
|-------|---------|---------|
| `delta_kernel` | 0.19 | Delta log replay (scan planning only) |
| `object_store` | 0.12 (renamed `object_store_kernel`) | Required by
kernel's engine (S3/Azure) |
| `roaring` | 0.10 | Deletion vector bitmap decoding |
| `thiserror` | workspace | Error type derive (already in workspace, added
to core's deps) |
### Transitive: Second Versions of Existing Crates (16)
Kernel pins arrow-57 / parquet-57 / object_store-0.12 internally. These
coexist alongside Comet's arrow-58 / parquet-58 / object_store-0.13. No types
cross the boundary.
`arrow`, `arrow-arith`, `arrow-array`, `arrow-buffer`, `arrow-cast`,
`arrow-csv`, `arrow-data`, `arrow-ipc`, `arrow-json`, `arrow-ord`, `arrow-row`,
`arrow-schema`, `arrow-select`, `arrow-string`,
`parquet`, `object_store`
### Truly New Crates (10)
| Crate | Pulled by | Purpose |
|-------|-----------|---------|
| `delta_kernel` | direct | Core dependency |
| `delta_kernel_derive` | delta_kernel | Proc macros for kernel |
| `roaring` | direct | Bitmap codec for deletion vectors |
| `crc` / `crc-catalog` | delta_kernel | Checksum validation for
checkpoints |
| `lz4_flex` | delta_kernel | Log entry compression |
| `z85` | delta_kernel | Deletion vector encoding |
| `document-features` | delta_kernel | Doc generation |
| `litrs` | delta_kernel_derive | Literal parsing for proc macros |
| `rustls-pemfile` | kernel's rustls engine | TLS certificate loading |
| `crossterm` / `crossterm_winapi` | delta_kernel | Transitive dev
dependency |
### Java/Scala Dependencies
None added to production. Delta-spark is test-scope only (unchanged from
before this PR).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]