abhishekagarwal87 commented on issue #12746: URL: https://github.com/apache/druid/issues/12746#issuecomment-1176046730
IMO it seems a bit weird to model `iceberg` and `delta` as input sources. They do look like file formats to me - though more of a logical file format than a physical one. If I understand correctly, one reason that we want to use `SQLInputSource` is the flexibility and ease of use of `SQL` as an interface. The underlying protocol itself to read data from these formats can still be anything that is splittable and performant. We will get the former once [we support batch ingestion via SQL](https://github.com/apache/druid/issues/11929). For the latter, we should implement a custom input format for these formats. There will be some work here that goes beyond implementing a format e.g. if I write "select * from iceberg_table where date=XYZ" - how does that translate into a set of iceberg folders/filers to be read? I think that this is where option 1) suggested by Gian is handy. Though it's limiting in the sense that it cannot do anything fancier. Say I want to read just one column C of a parquet file on S3. In such a case, I would want to issue a range get request to S3 with start and end pointing to column section C within that parquet file. This is possible if the parquet format itself is constructing S3 requests. There is also potential synergy with [Druid catalog proposal](https://github.com/apache/druid/issues/12546). If this catalog understands iceberg or delta tables, then the catalog can be used to filter the files that need to be accessed for that table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
