gianm commented on PR #13027: URL: https://github.com/apache/druid/pull/13027#issuecomment-1244734899
> @gianm I would say it is quite common when people are massaging data using Spark into Iceberg. @didip would you mind giving an example of how people would use the filter glob to read Iceberg data? I'm not familiar with how Iceberg stores data, so this would help me understand how the feature is likely to be used. I continue to be concerned about the confusingness of whole-path globs, so, I do think if we ship the feature then we should be really clear about how it works. Docs should explain what string is used as the path for the match. For example, if your prefix is `s3://mybucket/myprefix`, and there is an object `s3://mybucket/myprefix/foo/bar.txt` then is the path `/foo/bar.txt` (the part after the prefix) or is it `foo/bar.txt` (the part after the prefix, with leading `/` stripped), or is it `myprefix/foo/bar.txt` (the entire S3 object key)? We could also offer both `filter` (name glob) and `pathFilter` (whole-path glob) options. That way, consistency with `local` input source is preserved (where its `filter` applies to filenames only), and also, users with simple use cases that don't involve Iceberg integration can have a more-intuitive name-based matching. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
