Re: [I] [Feature]: Add table summary metrics [amoro]

via GitHub Sat, 17 Aug 2024 00:39:21 -0700


hhippodnsla commented on issue #3104:
URL: https://github.com/apache/amoro/issues/3104#issuecomment-2294752688


   In my opinion: we can use the concepts from DW such as identifying the 
dimension and facts.
   e.g. 
   the dimensions such as:
   * which table type? pure iceberg, mixed iceberg, mixed hive
   * which data type? data or metadata
   * is it in use? files expired or in use?
   * which file type for data/metadata? 
data/eq-del/pos-del/manifest/manifest-list and so on...
   * has partition and which parition?
   
   the facts such as:
   * number of files
   * total size of files
   * 90% file size
   * median file size
   * max file size
   
   Here are our exp sience 0.4:
   (let me bring it from my working computer on Monday....)
   
   Furthermore, the idea to reference metrics used in iceberg like czy006 said 
might be good idea, thus we can get more detail view from the data inside the 
table, but need more consideration when the table format is mixed hive.
   
   And we need to consider the capability of promethues reporter, since we've 
stepped on the pit here... (large number of metrics in single page will cause 
the performance issue)
   
   On the other hand, I've totally agree with klion26, so we can have a better 
understanding of what's the situation when self-optimizing working. (e.g. add 
to OptimizerGroupMetric?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Feature]: Add table summary metrics [amoro]

Reply via email to