Alex and I have PRs out related to supporting metrics in portable-runner code-paths:
- #7624 <https://github.com/apache/beam/pull/7624> associates metrics in the SDK harness with the (pre-fusion) PTransforms the user defined them in. - #7641 <https://github.com/apache/beam/pull/7641> sends metrics over the "Job API" (between job server and portable runner): - Flink portable-VR metrics tests pass (Java) - metrics print()s work in portable wordcount (Python) *Open Questions:* - What to do with type-specific protos (e.g. IntDistributionData vs. DoubleDistributionData)? - I think Alex and I were leaning toward only supporting the "int"-cases for now - That's what Java does in its existing metrics <https://github.com/apache/beam/pull/7641#discussion_r251895392> - "MetricKey" and "MetricName" semantics: - These exist in Java and Python, and I added proto versions in #7641 <https://github.com/apache/beam/pull/7641/files#r251896650>. - MetricName wraps "namespace" and "name" strings, and MetricKey wraps a "step (ptransform) name" and a MetricName. - PCollection-scoped metrics (e.g. element count) are identified by a null "step name" in #7624 <https://github.com/apache/beam/pull/7624> and #7641 <https://github.com/apache/beam/pull/7641>. - Alex and I discussed using URNs as the source of this information instead: - "step name" can instead come from a MonitoringInfo's PTRANSFORM label <https://github.com/apache/beam/blob/efb83e6c2fe486793947f6a80bec3a61f53a06bb/model/fn-execution/src/main/proto/beam_fn_api.proto#L436>, while "namespace" and "name" can be parsed from its URN <https://github.com/apache/beam/blob/efb83e6c2fe486793947f6a80bec3a61f53a06bb/model/fn-execution/src/main/proto/beam_fn_api.proto#L457-L482> . - URNs could encode these over the wire, then SDKs could convert to existing MetricKey/MetricNames for use in querying / MetricResults - or: we could more deeply overhaul SDKs' metrics/querying structures to use MonitoringInfos / URNs. - at the least, SDKs should get helpers for querying for Alex's new "system metrics" (e.g. element count, various timings <https://github.com/apache/beam/blob/efb83e6c2fe486793947f6a80bec3a61f53a06bb/model/fn-execution/src/main/proto/beam_fn_api.proto#L457-L482>) that are associated with specific URNs - Gauges: the protos have a nod to sending gauges over the wire as counters <https://github.com/apache/beam/blob/efb83e6c2fe486793947f6a80bec3a61f53a06bb/model/fn-execution/src/main/proto/beam_fn_api.proto#L506-L515> - are there problems with that? - #7641 should support this <https://github.com/apache/beam/pull/7641/files#r251930798>, for now. - ExtremaData: the protos contain these <https://github.com/apache/beam/blob/efb83e6c2fe486793947f6a80bec3a61f53a06bb/model/fn-execution/src/main/proto/beam_fn_api.proto#L517-L532>, but SDKs don't support them (afaik). Alex likely has more to add, and we plan to make a doc about these changes, but I wanted to post here first in case others have thoughts or we are overlooking anything. Thanks!