hhippodnsla commented on issue #3104: URL: https://github.com/apache/amoro/issues/3104#issuecomment-2294752688
In my opinion: we can use the concepts from DW such as identifying the dimension and facts. e.g. the dimensions such as: * which table type? pure iceberg, mixed iceberg, mixed hive * which data type? data or metadata * is it in use? files expired or in use? * which file type for data/metadata? data/eq-del/pos-del/manifest/manifest-list and so on... * has partition and which parition? the facts such as: * number of files * total size of files * 90% file size * median file size * max file size Here are our exp sience 0.4: (let me bring it from my working computer on Monday....) Furthermore, the idea to reference metrics used in iceberg like czy006 said might be good idea, thus we can get more detail view from the data inside the table, but need more consideration when the table format is mixed hive. And we need to consider the capability of promethues reporter, since we've stepped on the pit here... (large number of metrics in single page will cause the performance issue) On the other hand, I've totally agree with klion26, so we can have a better understanding of what's the situation when self-optimizing working. (e.g. add to OptimizerGroupMetric?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
