abhishekrb19 commented on code in PR #18953: URL: https://github.com/apache/druid/pull/18953#discussion_r2734649261
########## docs/ingestion/input-sources.md: ########## @@ -1063,6 +1063,7 @@ The following is a sample spec for a S3 warehouse source: |icebergCatalog|The JSON Object used to define the catalog that manages the configured Iceberg table.|yes| |warehouseSource|The JSON Object that defines the native input source for reading the data files from the warehouse.|yes| |snapshotTime|Timestamp in ISO8601 DateTime format that will be used to fetch the most recent snapshot as of this time.|no| +|residualFilterMode|Controls how residual filters are handled when filtering on non-partition columns. When an Iceberg filter targets a non-partition column, files may contain rows that don't match the filter (residual rows). Valid values are: `ignore` (default, ingest all rows), `warn` (log a warning but continue), `fail` (fail the ingestion job). Use `fail` to ensure filters only target partition columns.|no| Review Comment: fwiw, this is also the same behavior with Delta lake filtering, where the filter predicates are pushed down to partition columns; for filtering on non-partition columns, it's best-effort --- It might also make sense to update `icebergFilter` in the docs to clarify how filtering on partition columns vs non-partition columns behave and perhaps point to this new property `residualFilterMode` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
