[
https://issues.apache.org/jira/browse/BEAM-11217?focusedWorklogId=655580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-655580
]
ASF GitHub Bot logged work on BEAM-11217:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 27/Sep/21 16:10
Start Date: 27/Sep/21 16:10
Worklog Time Spent: 10m
Work Description: lostluck commented on pull request #15482:
URL: https://github.com/apache/beam/pull/15482#issuecomment-928033815
I've determined the cause of the failures, that were only visible due to
this PR. It turns out the wordcount test is bad.
Here's what's happening:
1. As written, the wordcount test [writes the test data to an in memory
filesystem](https://github.com/apache/beam/blob/master/sdks/go/test/integration/wordcount/wordcount_test.go#L83).
This file system is only available in-process. This means that it will only be
available when executing on the direct runner, or in LOOPBACK mode.
2. The question then becomes: Why doesn't the test fail if there are no
files? The problem is in how the [textio is
implemented](https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L76):
It uses the filename as a glob to list matching files, and then emitting them.
It doesn't see "no files" as a failure case. Whether that's actually a bug is a
separate concern.
3. Either way, no file names are sent downstream meaning subsequent DoFns
don't execute, leading to no counters at all.
My tip-off was that clearly the pipelines were executing, and Flink was
returning counters for all the PCollections, but not for any PTransforms (or at
least, not for extract).
The correct move for now is to create a new `WordCountFromPCol` function in
the integration version of the wordcount package. It will do everything after
the `textio.Read` in the WordCount function, but take in a scope and a
PCollection as input instead of the glob. The existing WordCount should call
this new function instead of having everything duplicated.
In the tests, instead of writing the data to an in memory file, we write it
using `beam.Create` (or `beam.CreateList`), and pass in the PCollection. At
which point the test should operate properly for all runners.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 655580)
Time Spent: 3h 50m (was: 3h 40m)
> Implement metrics filtering
> ---------------------------
>
> Key: BEAM-11217
> URL: https://issues.apache.org/jira/browse/BEAM-11217
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-go
> Reporter: Kamil Wasilewski
> Assignee: Ritesh Ghorse
> Priority: P3
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> `metrics.Results` misses a method for querying metrics using a provided
> filter. The method should take a filter object as an argument and return
> `metrics.QueryResults` object containing metrics that matched the filter.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)