+1 (binding)

Rebased fork and run internal performance tests.

While doing so, I run into the unit test issue below with the fn_runner
(Python direct runner), which did not occur with 2.21 [1]. That processing
time timers are not supported wasn't an issue previously, because the
timer, though declared, wasn't exercised in the unit test.

Is there a plan/JIRA to support processing time timers with the direct
runner?

Thanks,
Thomas


[1]
https://gist.github.com/tweise/6f8ca6341711f579b0ed9943b8f25138#file-synthetic_stateful-py-L250

/code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:555:
in __exit__
    self.result = self.run()
/code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:521: in run
    allow_proto_holders=True).run(False)
/code/venvs/venv/lib/python3.6/site-packages/apache_beam/pipeline.py:534: in run
    return self.runner.run_pipeline(self, self._options)
/code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/direct/direct_runner.py:119:
in run_pipeline
    return runner.run_pipeline(pipeline, options)
/code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:176:
in run_pipeline
    pipeline.to_runner_api(default_environment=self._default_environment))
/code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:182:
in run_via_runner_api
    self._check_requirements(pipeline_proto)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <apache_beam.runners.portability.fn_api_runner.fn_runner.FnApiRunner
object at 0x7fd2a896b400>
pipeline_proto = components {
  transforms {
    key: "ref_AppliedPTransform_AppliedPTransform_1"
    value {
      subtransforms: "ref...}
}
root_transform_ids: "ref_AppliedPTransform_AppliedPTransform_1"
requirements: "beam:requirement:pardo:stateful:v1"


    def _check_requirements(self, pipeline_proto):
      """Check that this runner can satisfy all pipeline requirements."""
      supported_requirements = set(self.supported_requirements())
      for requirement in pipeline_proto.requirements:
        if requirement not in supported_requirements:
          raise ValueError(
              'Unable to run pipeline with requirement: %s' % requirement)
      for transform in pipeline_proto.components.transforms.values():
        if transform.spec.urn == common_urns.primitives.TEST_STREAM.urn:
          raise NotImplementedError(transform.spec.urn)
        elif transform.spec.urn in translations.PAR_DO_URNS:
          payload = proto_utils.parse_Bytes(
              transform.spec.payload, beam_runner_api_pb2.ParDoPayload)
          for timer in payload.timer_family_specs.values():
            if timer.time_domain != beam_runner_api_pb2.TimeDomain.EVENT_TIME:
>             raise NotImplementedError(timer.time_domain)
E             NotImplementedError: 2

/code/venvs/venv/lib/python3.6/site-packages/apache_beam/runners/portability/fn_api_runner/fn_runner.py:283:
NotImplementedError



On Thu, Sep 10, 2020 at 4:41 PM Robert Bradshaw <[email protected]> wrote:

> Given the additional information, I am upgrading my vote to +1 (binding)
> based on my prior analysis.
>
> On Thu, Sep 10, 2020 at 4:14 PM Kyle Weaver <[email protected]> wrote:
>
>> +1 (non-binding)
>>
>> Validated wordcount with Python 3.7.8 and Flink 1.10.0 (both loopback and
>> using the Docker image). Also Python 3.7.8 loopback with an embedded Spark
>> cluster.
>>
>> On Thu, Sep 10, 2020 at 2:32 PM Daniel Oliveira <[email protected]>
>> wrote:
>>
>>> By the way, most of the validation so far has covered Direct runner and
>>> Dataflow, but Flink and Spark still have little validation, so if anyone
>>> can help with those it will help speed up the release.
>>>
>>> On Thu, Sep 10, 2020 at 2:12 PM Daniel Oliveira <[email protected]>
>>> wrote:
>>>
>>>> So I tracked the --temp_location issue down to
>>>> https://github.com/apache/beam/pull/12203 and asked @Pablo Estrada
>>>> <[email protected]> and @Chamikara Jayalath <[email protected]> about
>>>> it. It's not exactly a bug, but an intended change in requirements for
>>>> WriteToBigQuery, so the only fix I'll need to do is update the test script
>>>> with the appropriate flag, which should be easy. It also won't require
>>>> building a new release candidate.
>>>>
>>>> There is a possibility that user pipelines will break if they're using
>>>> BigQuery with the Python Direct Runner, so I'll add a note to the changelog
>>>> about it, but I don't think the change is significant enough to need
>>>> anything beyond that.
>>>>
>>>> On Thu, Sep 10, 2020 at 1:47 PM Chamikara Jayalath <
>>>> [email protected]> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>> On Thu, Sep 10, 2020 at 11:26 AM Ahmet Altay <[email protected]> wrote:
>>>>>
>>>>>> +1 - validated py3 quickstarts. The problem I mentioned earlier is
>>>>>> resolved.
>>>>>>
>>>>>> On Wed, Sep 9, 2020 at 7:46 PM Daniel Oliveira <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Good news: According to
>>>>>>> https://ci-beam.apache.org/job/beam_PostRelease_Python_Candidate/188/consoleFull
>>>>>>>  the
>>>>>>> Streaming Wordcount quickstart work for Dataflow with Python 2.7. So it
>>>>>>> looks like the container issue might be fixed.
>>>>>>>
>>>>>>> Bad news: That same Jenkins job failed on "Running HourlyTeamScore
>>>>>>> example with DirectRunner" because it's missing a --temp_location flag,
>>>>>>> despite using the DirectRunner. This looks like a bug, but I'm still
>>>>>>> investigating whether it'll need another cherry-pick and RC to fix or if
>>>>>>> the validation script just needs to be updated. I'll update the thread 
>>>>>>> if I
>>>>>>> find anything.
>>>>>>>
>>>>>>
>>>>>> Probably it does not require a cherry-pick. We have not validated
>>>>>> that workflow in the past few releases.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 9, 2020 at 4:58 PM Daniel Oliveira <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> The Dataflow Python Batch worker issue should be fixed now. I tried
>>>>>>>> verifying it myself via the rc validation script, but I've been having 
>>>>>>>> some
>>>>>>>> trouble with the GCP authentication so if someone else can validate it,
>>>>>>>> that would be a big help.
>>>>>>>>
>>>>>>>> On Tue, Sep 8, 2020 at 5:51 PM Robert Bradshaw <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I verified the signatures and all the artifacts are correct, and
>>>>>>>>> tested a wheel in a fresh virtual environment. It'd be good to see the
>>>>>>>>> Dataflow issue confirmed as fixed though.
>>>>>>>>>
>>>>>>>>> On Tue, Sep 8, 2020 at 5:17 PM Valentyn Tymofieiev <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> This error comes from the Dataflow Python Batch worker.
>>>>>>>>>>
>>>>>>>>>> Streaming workflows use sdk worker, which is provided by
>>>>>>>>>> apache-beam library, so the versions will match.
>>>>>>>>>>
>>>>>>>>>> The error should be fixed by setting the correct Dataflow worker
>>>>>>>>>> version in Dataflow containers, and does not affect Beam RC.
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 8, 2020 at 4:52 PM Ahmet Altay <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> -1 - I validated py3 quickstarts on dataflow and direct runner.
>>>>>>>>>>> I ran into 1 issue with batch workflows on dataflow:
>>>>>>>>>>>
>>>>>>>>>>> "RuntimeError: Beam SDK base version 2.24.0 does not match
>>>>>>>>>>> Dataflow Python worker version 2.24.0.dev. Please check
>>>>>>>>>>> Dataflow worker startup logs and make sure that correct version of 
>>>>>>>>>>> Beam SDK
>>>>>>>>>>> is installed."
>>>>>>>>>>>
>>>>>>>>>>> It seems like the batch worker needs to be rebuild. Not sure why
>>>>>>>>>>> the streaming worker did not fail (does it have the correct 
>>>>>>>>>>> version? or
>>>>>>>>>>> does it not have the same check?)
>>>>>>>>>>>
>>>>>>>>>>> Ahmet
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Sep 4, 2020 at 1:33 PM Valentyn Tymofieiev <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Dataflow containers are also available now.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 3, 2020 at 11:47 PM Daniel Oliveira <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> This should fix the BigQueryIO regression that Pablo caught.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As before, Dataflow containers are not yet ready. I or someone
>>>>>>>>>>>>> else will chime in on the thread once it's ready.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 3, 2020 at 11:39 PM Daniel Oliveira <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>> Please review and vote on the release candidate #3 for the
>>>>>>>>>>>>>> version 2.24.0, as follows:
>>>>>>>>>>>>>> [ ] +1, Approve the release
>>>>>>>>>>>>>> [ ] -1, Do not approve the release (please provide specific
>>>>>>>>>>>>>> comments)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The complete staging area is available for your review, which
>>>>>>>>>>>>>> includes:
>>>>>>>>>>>>>> * JIRA release notes [1],
>>>>>>>>>>>>>> * the official Apache source release to be deployed to
>>>>>>>>>>>>>> dist.apache.org [2], which is signed with the key with
>>>>>>>>>>>>>> fingerprint D0E7B69D911ADA3C0482BAA1C4E6B2F8C71D742F [3],
>>>>>>>>>>>>>> * all artifacts to be deployed to the Maven Central
>>>>>>>>>>>>>> Repository [4],
>>>>>>>>>>>>>> * source code tag "v2.24.0-RC3" [5],
>>>>>>>>>>>>>> * website pull request listing the release [6], publishing
>>>>>>>>>>>>>> the API reference manual [7], and the blog post [8].
>>>>>>>>>>>>>> * Java artifacts were built with Maven 3.6.3 and OpenJDK
>>>>>>>>>>>>>> 1.8.0.
>>>>>>>>>>>>>> * Python artifacts are deployed along with the source release
>>>>>>>>>>>>>> to the dist.apache.org [2].
>>>>>>>>>>>>>> * Validation sheet with a tab for 2.24.0 release to help with
>>>>>>>>>>>>>> validation [9].
>>>>>>>>>>>>>> * Docker images published to Docker Hub [10].
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The vote will be open for at least 72 hours. It is adopted by
>>>>>>>>>>>>>> majority approval, with at least 3 PMC affirmative votes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Release Manager
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12347146
>>>>>>>>>>>>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.24.0/
>>>>>>>>>>>>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>> https://repository.apache.org/content/repositories/orgapachebeam-1110/
>>>>>>>>>>>>>> [5] https://github.com/apache/beam/tree/v2.24.0-RC3
>>>>>>>>>>>>>> [6] https://github.com/apache/beam/pull/12743
>>>>>>>>>>>>>> [7] https://github.com/apache/beam-site/pull/607
>>>>>>>>>>>>>> [8] https://github.com/apache/beam/pull/12745
>>>>>>>>>>>>>> [9]
>>>>>>>>>>>>>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1432428331
>>>>>>>>>>>>>> [10] https://hub.docker.com/search?q=apache%2Fbeam&type=image
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Reply via email to