[
https://issues.apache.org/jira/browse/BEAM-4008?focusedWorklogId=110940&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-110940
]
ASF GitHub Bot logged work on BEAM-4008:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Jun/18 03:35
Start Date: 12/Jun/18 03:35
Worklog Time Spent: 10m
Work Description: tvalentyn commented on a change in pull request #5336:
[BEAM-4008] Futurize utils subpackage
URL: https://github.com/apache/beam/pull/5336#discussion_r194606891
##########
File path: sdks/python/apache_beam/utils/windowed_value.py
##########
@@ -178,33 +182,14 @@ def __repr__(self):
self.windows,
self.pane_info)
- def __hash__(self):
- return (hash(self.value) +
- 3 * self.timestamp_micros +
- 7 * hash(self.windows) +
- 11 * hash(self.pane_info))
-
- # We'd rather implement __eq__, but Cython supports that via __richcmp__
- # instead. Fortunately __cmp__ is understood by both (but not by Python 3).
- def __cmp__(left, right): # pylint: disable=no-self-argument
- """Compares left and right for equality.
-
- For performance reasons, doesn't actually impose an ordering
- on unequal values (always returning 1).
- """
- if type(left) is not type(right):
- return cmp(type(left), type(right))
+ def _key(self):
+ return self.value, self.timestamp_micros, self.windows, self.pane_info
- # TODO(robertwb): Avoid the type checks?
- # Returns False (0) if equal, and True (1) if not.
- return not WindowedValue._typed_eq(left, right)
+ def __eq__(self, other):
+ return type(self) == type(other) and self._key() == other._key()
- @staticmethod
- def _typed_eq(left, right):
- return (left.timestamp_micros == right.timestamp_micros
- and left.value == right.value
- and left.windows == right.windows
- and left.pane_info == right.pane_info)
+ def __hash__(self):
Review comment:
Thanks for your patience for awaiting the feedback, @RobbeSneyders.
I doublechecked performance of __hash__ using a microbenchmark. Previous
implementation of __hash__ appears to be at least 7.5% faster, possibly due to
overheads to create a tuple, and/or the way Cython compiles it.
I suggest we revert to previous implementation of __hash__. For consistency
we can also implement __eq__ by comparing fields directly.
My microbenchmark can be found in
https://github.com/apache/beam/compare/master...tvalentyn:utils_futurization_benchmark,
see
https://github.com/tvalentyn/beam/commit/67d3df0f1248f43feb84503361f0071efc560fc2.
It depends on the mini-framework that's being finalized in
https://github.com/apache/beam/pull/5565.
CC: @robertwb
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 110940)
Time Spent: 1h 20m (was: 1h 10m)
> Futurize and fix python 2 compatibility for utils subpackage
> ------------------------------------------------------------
>
> Key: BEAM-4008
> URL: https://issues.apache.org/jira/browse/BEAM-4008
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Robbe
> Assignee: Robbe
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)