sqd commented on PR #16339: URL: https://github.com/apache/iceberg/pull/16339#issuecomment-4659319884
I would argue that this is useful exactly when the out-of-the-box sink falls short in some features because the problem is too specific. For example 1. there is currently no way to specify the S3 storage class of a parquet, and I imagine we are not going to add a very specific feature only for S3. 2. there is also no way to collect parquet writing metrics, and we are probably not going to add a feature for that either because it's too specific to that one use case. 3. there are many other features that we are never going to add into the out-of-the-box sink, but some users might find valuable for their own specific scenarios. But with this extension point, users can roll and own their own logics. Also, the fact that the sink code was written `RowDataTaskWriterFactory implements TaskWriterFactory<RowData>` seems to suggest that something like this was in the plan (i.e. allowing another implementation of the factory) :-) . Please let me know what you think -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
