danielcweeks commented on issue #3062: URL: https://github.com/apache/iceberg/issues/3062#issuecomment-983306313
It's been quite a while since I looked at this (prior to Spark 3), but at the time, spark relied entirely on Hadoop FileSystem metrics for tracking purposes. I believe we created a shim that pulls IO metrics from the S3FileIO and reports them via the Hadoop FileSystem in order to expose this information. I think it is possible to create such a shim in the Iceberg Spark project, but we need to be careful not to leak the Hadoop packages (this would mean creating a metric callback interface in the S3FileIO) so as not to introduce a Hadoop dependency. That may provide a workaround until the upstream spark metrics framework is sorted out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
