[jira] [Work logged] (BEAM-3310) Push metrics to a backend in an runner agnostic way

ASF GitHub Bot (JIRA) Fri, 16 Mar 2018 08:39:52 -0700

     [ 
https://issues.apache.org/jira/browse/BEAM-3310?focusedWorklogId=81203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81203
 ]


ASF GitHub Bot logged work on BEAM-3310:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Mar/18 15:38
            Start Date: 16/Mar/18 15:38
    Worklog Time Spent: 10m 
      Work Description: echauchot commented on issue #4548: [BEAM-3310] Metrics 
pusher
URL: https://github.com/apache/beam/pull/4548#issuecomment-373752498
 
 
   
   
   
   
   Review status: 0 of 26 files reviewed at latest revision, 6 unresolved 
discussions, some commit checks failed.
   
   ---
   
   
*[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java,
 line 79 at 
r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mWhVmcivKjpDBM54k:-L6X-cVDGhjq1P1ALGWL:bkzx7tm)
 ([raw 
file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java#L79)):*
   <details><summary><i>Previously, swegner (Scott Wegner) 
wrote…</i></summary><blockquote>
   
   Good question. This is similar to the situation with DirectRunner, and I had 
to look it up to see how it works: 
https://github.com/apache/beam/blob/081de891e05a6c84db0f3fd03e3ad95f362d5c4c/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java#L283
   
   It seems that the `DefaultValueFactory` implementation uses reflection to 
instantiate DirectRunner at runtime without a compile-time reference. We could 
do something similar here, with the requirement that that MetricsHttpSink is 
either on the classpath (via runners-core), or a different sink is configured.
   </blockquote></details>
   
   ok I'll use reflexion and put MetricsSink in sdk as it is done for 
PipelineRunner. The implementations of MetricsSink can stay in runner-core
   
   ---
   
   
*[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java,
 line 82 at 
r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mXJjriEP9_vXEL0Ef:-L6aX3NXGH-H1_daXY7-:b-wy47yr)
 ([raw 
file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsPusher.java#L82)):*
   <details><summary><i>Previously, swegner (Scott Wegner) 
wrote…</i></summary><blockquote>
   
   That's a good idea. Note that if you just need an implementation for test 
asserts you can use a mock implementation as well.
   </blockquote></details>
   
   OK I have split DummyMetricsSink into NoOpMetricsSink (default sink) and 
TestMetricsSink (used for tests) and I no more start the polling thread and no 
more push when NoOpSink is provided.
   
   ---
   
   *[sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml, line 42 
at 
r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mb9MCqXj2PrgZGqt6:-L6VjD2dvP4FOsWIwwun:bub3ovg)
 ([raw 
file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/sdks/java/build-tools/src/main/resources/beam/findbugs-filter.xml#L42)):*
   <details><summary><i>Previously, swegner (Scott Wegner) 
wrote…</i></summary><blockquote>
   
   I tried removing this and running locally and I don't see the FindBugs 
warning.  Perhaps this is no longer relevant?
   
   If you are still seeing an error, can you paste it here? For reference, I am 
building via: `mvn clean install -T1C -DskipTests -Dcheckstyle.skip -am  -pl 
runners/core-java/`
   </blockquote></details>
   
   If I remove the exception in findbugs and launch 
   mvn clean install -Prelease in runners/core-java, I still get a 
    Redundant null check at MetricsPusher.java:[line 61] 
RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE
   
   ---
   
   
*[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/JsonMetricsSerializer.java,
 line 29 at 
r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4lluU2UEFYijwXu-hj:-L7iF-6UtWhYh16oCUMz:b-rqyxqb)
 ([raw 
file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/JsonMetricsSerializer.java#L29)):*
   <details><summary><i>Previously, swegner (Scott Wegner) 
wrote…</i></summary><blockquote>
   
   I disagree that manual json serialization is the right way to go. This code 
is brittle and cumbersome to maintain.
   
   Brittle: Does this serialization code correctly handle edge cases, such as 
null values, encoding special characters and very large numeric values? It's 
unclear, and writing a sufficient set of tests to validate each field would be 
tedious.
   
   Cumbersome: Making changes to this or related code will be difficult. What 
if the MetricQueryResults schema changes, or we want to add optional 
pretty-printing for debugging, or we want to make some fields optional for 
serialization? Starting with a real serialization library would make these 
changes easy. Rolling our own makes the code hard to maintain.
   
   If MetricQueryResults is poorly suited for serialization libraries, that 
means that consumers of the data will need manual deserialization as well. We 
should ensure that the data model is easy to consume, making modifications to 
MetricQueryResults if necessary. One idea is to define a POJO implementation of 
the MetricsQueryResults interface to use for serialization/deserialization. 
Another would be to use protobuf to define a language-agnostic schema with 
auto-generated language bindings and serialization format. This is what the 
FnAPI in the Beam portability framework is using: 
https://github.com/google/protobuf
   
   Mentioning a few people who have previously chimed in on the dev list: 
@lukecwik @robertwb @rmannibucau
   </blockquote></details>
   
   Fair enough, I agree.
   So I moved the MetricsHttpSink POC to a new runner-extensions-java module 
depending on jackson and I used jackson for serialization.
   
   Regarding your question about MetricQueryResults being poorly suited for 
serialization libraries: actually MetricQueryResults is almost suited for out 
of the box serialization with serialization libraries. Classes are already 
POJOs, the only thing that it lacks is getters that are called get* to be 
compatible with default object mappers. So as you suggested I preferred 
modifying  MetricQueryResults and related classes getter names than adding new 
POJO classes that do exactly the same but with get* names. I isolated this 
change in a commit so that it can be cherry picked to another PR if needed.
   
   ---
   
   
*[runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsSink.java,
 line 28 at 
r3](https://reviewable.io:443/reviews/apache/beam/4548#-L4mZncWXgIUYuM5Qf8j:-L6WFJd5xKRkRRaoHwZB:bnel6nh)
 ([raw 
file](https://github.com/apache/beam/blob/452d524e9e93c8e415be809a41117be2367e64d4/runners/core-java/src/main/java/org/apache/beam/runners/core/metrics/MetricsSink.java#L28)):*
   <details><summary><i>Previously, swegner (Scott Wegner) 
wrote…</i></summary><blockquote>
   
   Is there an actual use-case for HttpSink which requires different 
serializers? If HttpSink is just a proof-of-concept implementation then I don't 
think there's a need to focus too much on customizability.
   
   I could imagine the functionality in HttpSink being useful as a base for 
other sinks like Graphite. In that case, we should ensure the common 
functionality is easy to pull out and reuse within a Graphite sync. For that, 
having the concept of a MetricsSerializer is useful, but I don't know if it 
necessarily needs this base class.
   </blockquote></details>
   
   Indeed HttpSink is more a POC. If we would like to use it as a base class 
for other sinks like graphite, the reusable features would be:
   - serialize a metric: this can be done in the metric pojo you were talking 
about
   - issue an http request: this is not much to reuse especially if graphite 
(for ex) requests need extra parameters or headers (like authentication ...)
   => So I agree it might be over designed. I'll convert MetricsSink to an 
interface with a single writeMetrics method.
   
   ---
   
   
   *Comments from 
[Reviewable](https://reviewable.io:443/reviews/apache/beam/4548)*
   <!-- Sent from Reviewable.io -->
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 81203)
    Time Spent: 1.5h  (was: 1h 20m)

> Push metrics to a backend in an runner agnostic way
> ---------------------------------------------------
>
>                 Key: BEAM-3310
>                 URL: https://issues.apache.org/jira/browse/BEAM-3310
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-core
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Major
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The idea is to avoid relying on the runners to provide access to the metrics 
> (either at the end of the pipeline or while it runs) because they don't have 
> all the same capabilities towards metrics (e.g. spark runner configures sinks 
>  like csv, graphite or in memory sinks using the spark engine conf). The 
> target is to push the metrics in the common runner code so that no matter the 
> chosen runner, a user can get his metrics out of beam.
> Here is the link to the discussion thread on the dev ML: 
> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
> And the design doc:
> https://s.apache.org/runner_independent_metrics_extraction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-3310) Push metrics to a backend in an runner agnostic way

Reply via email to