[GitHub] [druid] gianm commented on issue #13923: Proposal: Druid extension to read and ingest Iceberg data files

via GitHub Mon, 13 Mar 2023 14:19:03 -0700


gianm commented on issue #13923:
URL: https://github.com/apache/druid/issues/13923#issuecomment-1466977105


   Very cool! I have heard a lot of people asking about Iceberg integration, so 
I think this kind of capability would be very interesting to the community.
   
   One question I have is whether it makes sense to do this as purely an 
InputSource? Like:
   
   ```
   "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "iceberg",
           "tableName": "logs",
           "namespace": "webapp",
           "partitionColumn": "event_time",
           "intervals": ["2023-01-26T00:00:00.000Z/2023-02-18T00:00:00.000Z"]
         }
   }
   ```
   
   I'm imagining that the IcebergInputSource goes out and finds the data 
backing the table, then delegates (internally) to the appropriate InputSource 
and InputFormat. In your example it'd internally delegate to HdfsInputSource to 
compute splits. For formatting, it would return `false` from `needsFormat`, 
then internally it would use ParquetInputFormat on calls to `reader`.
   
   As to MSQ integration, unless the implementation is doing something weird, 
it should work out of the box with `EXTERN`. EXTERN lets people use any 
InputSource and any InputFormat. We can add nicer SQL syntax for it in a future 
patch, but there is that fallback method.
   
   @maytasm was asking about this on Slack recently so might be interested as 
well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm commented on issue #13923: Proposal: Druid extension to read and ingest Iceberg data files

Reply via email to