Thelin90 opened a new issue, #12598: URL: https://github.com/apache/druid/issues/12598
### Motivation Hello. I have been using Apache Druid on and off for some time. Most recently, I used it as the base for a Lakehouse architecture I designed. I have been deploying it alone via K8S vanilla style, without any operators or hosted clouds etc with good success. However, since I am a big fan of the open source data layer created by databricks: [delta.io](http://delta.io/) , I ended up doing tedious movements of delta -> raw parquet -> load to druid . presto has recently created a [delta.io](http://delta.io/) adapter to link delta files directly, this is making me consider to use a presto cluster instead (would scale decently enough with current data load I have to work with). However, is there anyone who has faced this issue and solved it in a “nice” way, or are there any plans to add a delta connector similar to what presto has done? Here is a reference to what presto has added: https://prestodb.io/blog/2022/03/15/native-delta-lake-connector-for-presto And a video: https://www.youtube.com/watch?v=JrXGkqpl7xk (fast forward to `21:40`). ### Proposed changes Implement delta.io connector to load in into apache druid. ### Rationale Currently it is very tedious, and a bit inefficient when you are working with delta.io as your underlaying layer, to then have another service extracting version `N` of delta to load into raw parquet, to make it accessible to `apache druid`. ### Operational impact It should be done in a way so it does not impact backwards compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
