abhishekagarwal87 commented on issue #12746:
URL: https://github.com/apache/druid/issues/12746#issuecomment-1176046730

   IMO it seems a bit weird to model `iceberg` and `delta` as input sources. 
They do look like file formats to me - though more of a logical file format 
than a physical one. If I understand correctly, one reason that we want to use 
`SQLInputSource` is the flexibility and ease of use of `SQL` as an interface. 
The underlying protocol itself to read data from these formats can still be 
anything that is splittable and performant. We will get the former once [we 
support batch ingestion via SQL](https://github.com/apache/druid/issues/11929). 
For the latter, we should implement a custom input format for these formats. 
   
   There will be some work here that goes beyond implementing a format e.g. if 
I write "select * from iceberg_table where date=XYZ" - how does that translate 
into a set of iceberg folders/filers to be read? I think that this is where 
option 1) suggested by Gian is handy. Though it's limiting in the sense that it 
cannot do anything fancier. Say I want to read just one column C of a parquet 
file on S3. In such a case, I would want to issue a range get request to S3 
with start and end pointing to column section C within that parquet file. This 
is possible if the parquet format itself is constructing S3 requests. 
   
   There is also potential synergy with [Druid catalog 
proposal](https://github.com/apache/druid/issues/12546). If this catalog 
understands iceberg or delta tables, then the catalog can be used to filter the 
files that need to be accessed for that table. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to