Thanks for sharing the doc! Left some comments but I think the general design looks great and I'm exited to see this new source in Beam.
Best, Ahmed On Tue, May 12, 2026 at 2:28 PM Chamikara Jayalath via dev < [email protected]> wrote: > Hi all, > > I'm looking into implementing a Delta Lake [1] source for Apache Beam. > > Some of the highlights are listed below. > > *Add support for reading data from an existing Delta Lake table (at HEAD, > which could be past the latest checkpoint). > * Support reading from a specific checkpoint (latest or past). > * Use the new Delta Kernel API to implement the source. > * Support parallelized reading via initial splitting and/or dynamic work > rebalancing. > * Support for Beam managed I/O - this will automatically make the > connector available to Python SDK and will also allow runners to manage the > version of the connector. > > A design doc is available here: > https://s.apache.org/beam-delta-lake-source > > Please let me know if you have any comments/questions. > > Thanks, > Cham > > [1] https://delta.io/ >
