jackye1995 commented on pull request #4254:
URL: https://github.com/apache/iceberg/pull/4254#issuecomment-1062258595


   > This is just a convenient way to get metrics to engines that already use 
Hadoop. I think we all agree that we want to be able to extend this, but it is 
a reasonable default for now.
   
   Yes, in that case let's first try to add the `MetricsProducer` mixin 
interface. I agree the catalog-based solution is more like a workaround, let's 
just implement the right thing.
   
   In this model, how would we eventually make this not just a S3FileIO 
specific implementation?
   
   I am thinking about something like the following in SparkCatalog
   
   ```java
   private Pair<Table, Long> load(Identifier ident) {
     ...
     Table table;
     try {
       table = icebergCatalog.loadTable(namespaceAsIdent);
       FileIO io = table.io();
       if (io instanceof MetricsProducer) {
           // Report Hadoop metrics by default, will add logic and option for 
SparkMetricsContext in the future when that is available
           try {
             DynConstructors.Ctor<MetricsContext> ctor =
                 
DynConstructors.builder(MetricsContext.class).hiddenImpl(DEFAULT_METRICS_IMPL, 
String.class).buildChecked();
             // how would scheme be set for ResolvingFileIO? should we take 
table.location() as input to constructor to derive the scheme?
             MetricsContext metrics = ctor.newInstance(io.scheme());
             metrics.initialize(properties);
           } catch (NoSuchMethodException | ClassCastException e) {
             LOG.warn("Unable to load metrics class: '{}', falling back to null 
metrics", DEFAULT_METRICS_IMPL, e);
           }
         ((MetricsProducer) io).setMetricsContext(metrics);
       }
     } catch (Exception ignored) {
       // the namespace does not identify a table, so it cannot be a table with 
a snapshot selector
       // throw the original exception
       throw e;
     }
     ...
   }
   ```
   
   Would this make sense to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to