zhangyue19921010 commented on PR #6600: URL: https://github.com/apache/hudi/pull/6600#issuecomment-1318166797
> @zhangyue19921010 thanks for taking this up! Some high level thoughts: > > * **hudi commit metadata vs hudi metrics**: if users enable diagnostic reporter, should we have a config to include metrics reporter's data? metrics system is good at showing the trends but hard to cross-check against commit metadata. so regardless of enabling metrics reporter or not, diagnostic reporter can collect metrics and save to report dir, just like a csv/json metrics reporter. We can also refine what goes to metrics and what goes to commit metadata, to keep the responsibilities clear and reporting data organized. > * **consolidate with error table**: [RFC-20](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+20+%3A+handle+failed+records) this is a long-pending feature that also aims to assist investigation. diagnostic reporter should be aware of error table settings and zip the error table if configured so. Size could be a concern, so configuration can be given to zip the whole table, or sample records, or skip error table completely. Also it requires some config to allow masking any fields. Taking a step further, we can also make error table one of the diagnostic reporting features. They have similar storage structures: can be local to the hudi table or global to the whole platform. > * **work with metadata table**: you've already mentioned collecting stats by listing the file system. diagnostic reporter should also be aware of the presence of metadata table and zip the table or extract relevant data - fallback to file system listing if not present. Thanks @xushiyan for your advice! Will have a deep look and expand this rfc asap! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
