+1 (binding) Rebased fork and run internal performance tests.
While doing so, I run into the unit test issue below with the fn_runner (Python direct runner), which did not occur with 2.21 [1]. That processing time timers are not supported wasn't an issue previously, because the timer, though declared, wasn't exercised in the unit test. Is there a plan/JIRA to support processing time timers with the direct runner? Thanks, Thomas [1] https://gist.github.com/tweise/6f8ca6341711f579b0ed9943b8f25138#file-synthetic_stateful-py-L250 /code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:555: in __exit__ self.result = self.run() /code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:521: in run allow_proto_holders=True).run(False) /code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:534: in run return self.runner.run_pipeline(self, self._options) /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/direct/direct_runner.py:119: in run_pipeline return runner.run_pipeline(pipeline, options) /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:176: in run_pipeline pipeline.to_runner_api(default_environment=self._default_environment)) /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:182: in run_via_runner_api self._check_requirements(pipeline_proto) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <apache_beam.runners.portability.fn_api_runner.fn_runner.FnApiRunner object at 0x7fd2a896b400> pipeline_proto = components { transforms { key: "ref_AppliedPTransform_AppliedPTransform_1" value { subtransforms: "ref...} } root_transform_ids: "ref_AppliedPTransform_AppliedPTransform_1" requirements: "beam:requirement:pardo:stateful:v1" def _check_requirements(self, pipeline_proto): """Check that this runner can satisfy all pipeline requirements.""" supported_requirements = set(self.supported_requirements()) for requirement in pipeline_proto.requirements: if requirement not in supported_requirements: raise ValueError( 'Unable to run pipeline with requirement: %s' % requirement) for transform in pipeline_proto.components.transforms.values(): if transform.spec.urn == common_urns.primitives.TEST_STREAM.urn: raise NotImplementedError(transform.spec.urn) elif transform.spec.urn in translations.PAR_DO_URNS: payload = proto_utils.parse_Bytes( transform.spec.payload, beam_runner_api_pb2.ParDoPayload) for timer in payload.timer_family_specs.values(): if timer.time_domain != beam_runner_api_pb2.TimeDomain.EVENT_TIME: > raise NotImplementedError(timer.time_domain) E NotImplementedError: 2 /code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:283: NotImplementedError On Thu, Sep 10, 2020 at 4:41 PM Robert Bradshaw <[email protected]> wrote: > Given the additional information, I am upgrading my vote to +1 (binding) > based on my prior analysis. > > On Thu, Sep 10, 2020 at 4:14 PM Kyle Weaver <[email protected]> wrote: > >> +1 (non-binding) >> >> Validated wordcount with Python 3.7.8 and Flink 1.10.0 (both loopback and >> using the Docker image). Also Python 3.7.8 loopback with an embedded Spark >> cluster. >> >> On Thu, Sep 10, 2020 at 2:32 PM Daniel Oliveira <[email protected]> >> wrote: >> >>> By the way, most of the validation so far has covered Direct runner and >>> Dataflow, but Flink and Spark still have little validation, so if anyone >>> can help with those it will help speed up the release. >>> >>> On Thu, Sep 10, 2020 at 2:12 PM Daniel Oliveira <[email protected]> >>> wrote: >>> >>>> So I tracked the --temp_location issue down to >>>> https://github.com/apache/beam/pull/12203 and asked @Pablo Estrada >>>> <[email protected]> and @Chamikara Jayalath <[email protected]> about >>>> it. It's not exactly a bug, but an intended change in requirements for >>>> WriteToBigQuery, so the only fix I'll need to do is update the test script >>>> with the appropriate flag, which should be easy. It also won't require >>>> building a new release candidate. >>>> >>>> There is a possibility that user pipelines will break if they're using >>>> BigQuery with the Python Direct Runner, so I'll add a note to the changelog >>>> about it, but I don't think the change is significant enough to need >>>> anything beyond that. >>>> >>>> On Thu, Sep 10, 2020 at 1:47 PM Chamikara Jayalath < >>>> [email protected]> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> Thanks, >>>>> Cham >>>>> >>>>> On Thu, Sep 10, 2020 at 11:26 AM Ahmet Altay <[email protected]> wrote: >>>>> >>>>>> +1 - validated py3 quickstarts. The problem I mentioned earlier is >>>>>> resolved. >>>>>> >>>>>> On Wed, Sep 9, 2020 at 7:46 PM Daniel Oliveira < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Good news: According to >>>>>>> https://ci-beam.apache.org/job/beam_PostRelease_Python_Candidate/188/consoleFull >>>>>>> the >>>>>>> Streaming Wordcount quickstart work for Dataflow with Python 2.7. So it >>>>>>> looks like the container issue might be fixed. >>>>>>> >>>>>>> Bad news: That same Jenkins job failed on "Running HourlyTeamScore >>>>>>> example with DirectRunner" because it's missing a --temp_location flag, >>>>>>> despite using the DirectRunner. This looks like a bug, but I'm still >>>>>>> investigating whether it'll need another cherry-pick and RC to fix or if >>>>>>> the validation script just needs to be updated. I'll update the thread >>>>>>> if I >>>>>>> find anything. >>>>>>> >>>>>> >>>>>> Probably it does not require a cherry-pick. We have not validated >>>>>> that workflow in the past few releases. >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Sep 9, 2020 at 4:58 PM Daniel Oliveira < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> The Dataflow Python Batch worker issue should be fixed now. I tried >>>>>>>> verifying it myself via the rc validation script, but I've been having >>>>>>>> some >>>>>>>> trouble with the GCP authentication so if someone else can validate it, >>>>>>>> that would be a big help. >>>>>>>> >>>>>>>> On Tue, Sep 8, 2020 at 5:51 PM Robert Bradshaw <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I verified the signatures and all the artifacts are correct, and >>>>>>>>> tested a wheel in a fresh virtual environment. It'd be good to see the >>>>>>>>> Dataflow issue confirmed as fixed though. >>>>>>>>> >>>>>>>>> On Tue, Sep 8, 2020 at 5:17 PM Valentyn Tymofieiev < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> This error comes from the Dataflow Python Batch worker. >>>>>>>>>> >>>>>>>>>> Streaming workflows use sdk worker, which is provided by >>>>>>>>>> apache-beam library, so the versions will match. >>>>>>>>>> >>>>>>>>>> The error should be fixed by setting the correct Dataflow worker >>>>>>>>>> version in Dataflow containers, and does not affect Beam RC. >>>>>>>>>> >>>>>>>>>> On Tue, Sep 8, 2020 at 4:52 PM Ahmet Altay <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> -1 - I validated py3 quickstarts on dataflow and direct runner. >>>>>>>>>>> I ran into 1 issue with batch workflows on dataflow: >>>>>>>>>>> >>>>>>>>>>> "RuntimeError: Beam SDK base version 2.24.0 does not match >>>>>>>>>>> Dataflow Python worker version 2.24.0.dev. Please check >>>>>>>>>>> Dataflow worker startup logs and make sure that correct version of >>>>>>>>>>> Beam SDK >>>>>>>>>>> is installed." >>>>>>>>>>> >>>>>>>>>>> It seems like the batch worker needs to be rebuild. Not sure why >>>>>>>>>>> the streaming worker did not fail (does it have the correct >>>>>>>>>>> version? or >>>>>>>>>>> does it not have the same check?) >>>>>>>>>>> >>>>>>>>>>> Ahmet >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 4, 2020 at 1:33 PM Valentyn Tymofieiev < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Dataflow containers are also available now. >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 3, 2020 at 11:47 PM Daniel Oliveira < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> This should fix the BigQueryIO regression that Pablo caught. >>>>>>>>>>>>> >>>>>>>>>>>>> As before, Dataflow containers are not yet ready. I or someone >>>>>>>>>>>>> else will chime in on the thread once it's ready. >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 3, 2020 at 11:39 PM Daniel Oliveira < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>> Please review and vote on the release candidate #3 for the >>>>>>>>>>>>>> version 2.24.0, as follows: >>>>>>>>>>>>>> [ ] +1, Approve the release >>>>>>>>>>>>>> [ ] -1, Do not approve the release (please provide specific >>>>>>>>>>>>>> comments) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> The complete staging area is available for your review, which >>>>>>>>>>>>>> includes: >>>>>>>>>>>>>> * JIRA release notes [1], >>>>>>>>>>>>>> * the official Apache source release to be deployed to >>>>>>>>>>>>>> dist.apache.org [2], which is signed with the key with >>>>>>>>>>>>>> fingerprint D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3], >>>>>>>>>>>>>> * all artifacts to be deployed to the Maven Central >>>>>>>>>>>>>> Repository [4], >>>>>>>>>>>>>> * source code tag "v2.24.0-RC3" [5], >>>>>>>>>>>>>> * website pull request listing the release [6], publishing >>>>>>>>>>>>>> the API reference manual [7], and the blog post [8]. >>>>>>>>>>>>>> * Java artifacts were built with Maven 3.6.3 and OpenJDK >>>>>>>>>>>>>> 1.8.0. >>>>>>>>>>>>>> * Python artifacts are deployed along with the source release >>>>>>>>>>>>>> to the dist.apache.org [2]. >>>>>>>>>>>>>> * Validation sheet with a tab for 2.24.0 release to help with >>>>>>>>>>>>>> validation [9]. >>>>>>>>>>>>>> * Docker images published to Docker Hub [10]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The vote will be open for at least 72 hours. It is adopted by >>>>>>>>>>>>>> majority approval, with at least 3 PMC affirmative votes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Release Manager >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146 >>>>>>>>>>>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/ >>>>>>>>>>>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS >>>>>>>>>>>>>> [4] >>>>>>>>>>>>>> https://repository.apache.org/content/repositories/orgapachebeam-1110/ >>>>>>>>>>>>>> [5] https://github.com/apache/beam/tree/v2.24.0-RC3 >>>>>>>>>>>>>> [6] https://github.com/apache/beam/pull/12743 >>>>>>>>>>>>>> [7] https://github.com/apache/beam-site/pull/607 >>>>>>>>>>>>>> [8] https://github.com/apache/beam/pull/12745 >>>>>>>>>>>>>> [9] >>>>>>>>>>>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331 >>>>>>>>>>>>>> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image >>>>>>>>>>>>>> >>>>>>>>>>>>>>
