abhishekrb19 commented on code in PR #16288: URL: https://github.com/apache/druid/pull/16288#discussion_r1571388932
########## docs/ingestion/input-sources.md: ########## @@ -1141,7 +1141,85 @@ To use the Delta Lake input source, load the extension [`druid-deltalake-extensi You can use the Delta input source to read data stored in a Delta Lake table. For a given table, the input source scans the latest snapshot from the configured table. Druid ingests the underlying delta files from the table. -The following is a sample spec: + | Property|Description|Required| +|---------|-----------|--------| +| type|Set this value to `delta`.|yes| +| tablePath|The location of the Delta table.|yes| +| filter|The JSON Object that filters data files within a snapshot.|no| + +### Delta filter object + +You can use these filters to filter out data files from a snapshot, reducing the number of files Druid has to ingest from +a Delta table. This input source provides the following filters: `and`, `or`, `not`, `=`, `>`, `>=`, `<`, `<=`. + +When a filter is applied on non-partitioned columns, the filtering is best-effort as the Delta Kernel solely relies +on statistics collected when the non-partitioned table is created. In this scenario, this Druid connector may ingest +data that doesn't match the filter. For guaranteed filtering behavior, use it only on partitioned columns. Review Comment: Yes, reads much better. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
