Hi all, I'm looking into implementing a Delta Lake [1] source for Apache Beam.
Some of the highlights are listed below. *Add support for reading data from an existing Delta Lake table (at HEAD, which could be past the latest checkpoint). * Support reading from a specific checkpoint (latest or past). * Use the new Delta Kernel API to implement the source. * Support parallelized reading via initial splitting and/or dynamic work rebalancing. * Support for Beam managed I/O - this will automatically make the connector available to Python SDK and will also allow runners to manage the version of the connector. A design doc is available here: https://s.apache.org/beam-delta-lake-source Please let me know if you have any comments/questions. Thanks, Cham [1] https://delta.io/
