kennknowles opened a new issue, #18424:
URL: https://github.com/apache/beam/issues/18424

   In https://github.com/apache/beam/pull/2838 aggregators were removed from 
Spark runner, this caused regression around dropped windows counters and logs.
   
   `CounterCell` instances are created ad hoc instead of using the `Metrics` 
class static factory methods: 
[SparkGroupAlsoByWindowViaWindowSet.java#L213-L219](https://github.com/apache/beam/blob/v2.1.0/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java#L213-L219)
   Context of where the metrics are reported isn't taken into account, and 
since these counters are being passed to a lazily evaluated iterator 
[SparkGroupAlsoByWindowViaWindowSet.java#L221-L223](https://github.com/apache/beam/blob/v2.1.0/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java#L221-L223)
 the subsequent code which looks at the counters is always looking at these 
counters immediately after initialization, before they are populated, so these 
prints will never happen since the conditional statements do not check on the 
right counters 
[SparkGroupAlsoByWindowViaWindowSet.java#L323-L333](https://github.com/apache/beam/blob/v2.1.0/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java#L323-L333).
   What we want is these counts exposed as metrics as well as logs.
   
   Additionally, 
`org.apache.beam.runners.core.LateDataUtils#dropExpiredWindows` now takes a 
`CounterCell` as a parameter, which is a class for metrics implementation and 
should generally not be used elsewhere (this is also mentioned in its Javadoc), 
we should look into changing this method to use something else and perhaps make 
`CounterCell` and similar classes package private (And change runner code which 
uses these to be in the same package).
   
   Imported from Jira 
[BEAM-2812](https://issues.apache.org/jira/browse/BEAM-2812). Original Jira may 
contain additional context.
   Reported by: aviemzur.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to