Re: [PR] feat: Native Delta Lake scan via delta-kernel-rs [datafusion-comet]

via GitHub Mon, 13 Apr 2026 05:48:26 -0700


schenksj commented on PR #3932:
URL: 
https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4236474713


   > Thanks @schenksj. Could you fix the linter issues (see contributor guide 
for instructions).
   > 
   > Thanks for acknowledging that this was written by AI. This is a very large 
PR for a significant new feature. Adding support for Delta Lake certainly has 
value, but we need to consider who is going to maintain this code going 
forward. I am concerned that if we merge this and then there are changes in the 
delta-lake-rs dependency in the future then it could cause an extra maintenance 
burden on the existing maintainers, who are more focused on Iceberg support and 
have been contributing to Iceberg as well.
   > 
   > Could you tell me more about the motivation for this work? Do you have any 
suggestions for how this could be maintained in the future?
   
   
   
   > Thanks @schenksj. Could you fix the linter issues (see contributor guide 
for instructions).
   > 
   > Thanks for acknowledging that this was written by AI. This is a very large 
PR for a significant new feature. Adding support for Delta Lake certainly has 
value, but we need to consider who is going to maintain this code going 
forward. I am concerned that if we merge this and then there are changes in the 
delta-lake-rs dependency in the future then it could cause an extra maintenance 
burden on the existing maintainers, who are more focused on Iceberg support and 
have been contributing to Iceberg as well.
   > 
   > Could you tell me more about the motivation for this work? Do you have any 
suggestions for how this could be maintained in the future?
   
   Here’s the same version with all em dashes removed and phrasing adjusted to 
keep it smooth:
   
   ---
   
   Hi Andy,
   
   First, thanks for the quick response. I appreciate it.  On the AI side, I 
think its better to use best tools available and be honest about our processes 
so that we can mature our practices and focus as an industry.  To address your 
questions...
   
   The motivation on my side is that my day-job employer is a significant user 
of Delta, and I find the current state and future direction of Delta Uniform, 
particularly its openness, a bit unclear. It is important for us to preserve 
vendor flexibility within our Spark stacks, and having a viable accelerator 
outside of Databricks is a key part of that. This work is a step in that 
direction.
   
   From a maintainability perspective, I have a couple of thoughts. The design 
of this PR intentionally minimizes direct reliance on delta-rs by using the 
kernel only for scan planning, not execution.  It also has fairly extensive 
test cases to detect regressions, but as you know that has its own limitations. 
 As long as Comet continues to directly support Parquet, this approach should 
remain relatively stable over time.
   
   That said, there is an opportunity to move toward a more pluggable 
architecture. For example, a third-party library, such as a Delta or Hudi 
provider, could implement a native scan planning interface exposed by Comet. 
This would allow dependencies and integrations to be fully externalized and 
would shift the maintenance burden to the plugin owner.
   
   Longer term, I would like to see [IndexTables](https://indextables.io) and 
Comet become compatible to help accelerate joins and such on plain spark. 
Achieving that would likely require a more robust plugin model that supports 
not just scan planning, but also FFI-based columnar streaming. That is a more 
involved effort and likely a ways out, given the current state of my codebase.
   
   Love your thoughts, and of course no hard feelings if this doesn't align 
with where you want to focus your product.
   
   ---
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Native Delta Lake scan via delta-kernel-rs [datafusion-comet]

Reply via email to