[
https://issues.apache.org/jira/browse/BEAM-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmet Altay updated BEAM-4518:
------------------------------
Fix Version/s: (was: 2.11.0)
> Errors when running Python Game stats with a low fixed_window_duration in the
> DirectRunner
> ------------------------------------------------------------------------------------------
>
> Key: BEAM-4518
> URL: https://issues.apache.org/jira/browse/BEAM-4518
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.5.0, 2.6.0
> Reporter: Ahmet Altay
> Priority: Critical
> Labels: flake
>
> Using the injector and the following command to start the DirectRunner
> pipeline:
> python -m apache_beam.examples.complete.game.game_stats \
> --project=google.com:clouddfe \
> --topic projects/google.com:clouddfe/topics/leader_board-$USER-topic-1 \
> --dataset ${USER}_test --fixed_window_duration 1
> Fails with:
> ValueError: PCollection of size 2 with more than one element accessed as a
> singleton view. First two elements encountered are "13.93", "11.7777777778".
> [while running 'CalculateSpammyUsers/ProcessAndFilter']
> Offending code is here:
> global_mean_score = (
> sum_scores
> | beam.Values()
> | beam.CombineGlobally(beam.combiners.MeanCombineFn())\
> .as_singleton_view())
> # Filter the user sums using the global mean.
> filtered = (
> sum_scores
> # Use the derived mean total score (global_mean_score) as a side input.
> | 'ProcessAndFilter' >> beam.Filter(
> lambda key_score, global_mean:\
> key_score[1] > global_mean * self.SCORE_WEIGHT,
> global_mean_score))
> Since global_mean_score is the result of CombineGlobally, this is either an
> issue with CombineGlobally or side inputs implementation in DirectRunner. The
> latter is more likely since it works on DataflowRunner.
> cc: [~mariagh]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)