[
https://issues.apache.org/jira/browse/BEAM-4858?focusedWorklogId=240460&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-240460
]
ASF GitHub Bot logged work on BEAM-4858:
----------------------------------------
Author: ASF GitHub Bot
Created on: 10/May/19 22:41
Start Date: 10/May/19 22:41
Worklog Time Spent: 10m
Work Description: yifanmai commented on pull request #8556: [BEAM-4858]
Increase tolerance on linear regression tests
URL: https://github.com/apache/beam/pull/8556
Currently `BatchElementsTest.test_no_numpy_regression` and
`BatchElementsTest.test_numpy_regression` uses `assertAlmostEqual` with a delta
that corresponds to exactly the delta expected due to the random inputs. This
means that additional error due to floating point calculations can cause the
test to fail non-deterministically. This change adds an additional tolerance to
account for the additional error.
------------------------
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
| --- | --- | --- | --- | --- | ---
Java | [](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)<br>[](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
<br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
| --- | --- | ---
Pre-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
--- |Java | Python | Go | Website
--- | --- | --- | --- | ---
Non-portable | [](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
Portable | --- | [](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/)
| --- | ---
See
[.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md)
for trigger phrase, status and link of all Jenkins jobs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 240460)
Time Spent: 5h 50m (was: 5h 40m)
> Clean up _BatchSizeEstimator in element-batching transform.
> -----------------------------------------------------------
>
> Key: BEAM-4858
> URL: https://issues.apache.org/jira/browse/BEAM-4858
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Valentyn Tymofieiev
> Assignee: Robert Bradshaw
> Priority: Minor
> Fix For: 2.8.0
>
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
> Beam Python 3 conversion [exposed|https://github.com/apache/beam/pull/5729]
> non-trivial performance-sensitive logic in element-batching transform. Let's
> take a look at
> [util.py#L271|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271].
>
> Due to Python 2 language semantics, the result of {{x2 / x1}} will depend on
> the type of the keys - whether they are integers or floats.
> The keys of key-value pairs contained in {{self._data}} are added as integers
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L260],
> however, when we 'thin' the collected entries
> [here|https://github.com/apache/beam/blob/d2ac08da2dccce8930432fae1ec7c30953880b69/sdks/python/apache_beam/transforms/util.py#L279],
> the keys will become floats. Surprisingly, using either integer or float
> division consistently [in the
> comparator|https://github.com/apache/beam/blob/e98ff7c96afa2f72b3a98426dc1e9a47224da5c8/sdks/python/apache_beam/transforms/util.py#L271]
> negatively affects the performance of a custom pipeline I was using to
> benchmark these changes. The performance impact likely comes from changes in
> the logic that depends on how division is evaluated, not from the
> performance of division operation itself.
> In terms of Python 3 conversion the best course of action that avoids
> regression seems to be to preserve the existing Python 2 behavior using
> {{old_div}} from {{past.utils.division}}, in the medium term we should clean
> up the logic. We may want to add a targeted microbenchmark to evaluate
> performance of this code, and maybe cythonize the code, since it seems to be
> performance-sensitive.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)