[GitHub] [iceberg] danielcweeks commented on issue #3062: Report I/O metrics to Spark

GitBox Tue, 30 Nov 2021 21:38:09 -0800


danielcweeks commented on issue #3062:
URL: https://github.com/apache/iceberg/issues/3062#issuecomment-983306313



   It's been quite a while since I looked at this (prior to Spark 3), but at 
the time, spark relied entirely on Hadoop FileSystem metrics for tracking 
purposes.  I believe we created a shim that pulls IO metrics from the S3FileIO 
and reports them via the Hadoop FileSystem in order to expose this information.
   
   I think it is possible to create such a shim in the Iceberg Spark project, 
but we need to be careful not to leak the Hadoop packages (this would mean 
creating a metric callback interface in the S3FileIO) so as not to introduce a 
Hadoop dependency.
   
   That may provide a workaround until the upstream spark metrics framework is 
sorted out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] danielcweeks commented on issue #3062: Report I/O metrics to Spark

Reply via email to