[
https://issues.apache.org/jira/browse/BEAM-3981?focusedWorklogId=109566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109566
]
ASF GitHub Bot logged work on BEAM-3981:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Jun/18 22:08
Start Date: 06/Jun/18 22:08
Worklog Time Spent: 10m
Work Description: tvalentyn commented on issue #5053: [BEAM-3981]
Futurize coders subpackage
URL: https://github.com/apache/beam/pull/5053#issuecomment-395229849
An update on `dict.iteritems` vs `dict.items()` vs
`future.utils.iteritems(dict)` - I did more performance testing of
encode-decode operation using a microbenchmark (currently in flight:
https://github.com/apache/beam/pull/5565).
I don't observe a difference in performance of `dict.iteritems()` and
`future.utils.iteritems(dict)`.
As far as `dict.items()` vs `dict.iteritems()` goes, I saw a 2x performance
slowdown in coder implementation with dict.items() for dictionaries with over
100000 entries, but did not observe a significant difference on dictionaries
with 10000 entries or less. That said I think it would not hurt to keep using
`iteritems()` for Python 2 as we do now.
With `future.utils.iteritems()`:
```
Median time cost:
Dict[int, int], FastPrimitiveCoder : per element median time cost:
3.27529e-07 sec
```
With `dict.iteritems()`:
```
Median time cost:
Dict[int, int], FastPrimitiveCoder : per element median time cost:
3.4485e-07 sec
```
With `dict.items()`:
```
Median time cost:
Dict[int, int], FastPrimitiveCoder : per element median time cost:
7.3393e-07 sec
```
I also observe a 2.5x degradation in coder implementation with
`builtins.range()` compared to `range()` on lists as small as 1000 - 10000
elements. I did not try smaller lists.
With 10000 elements, python 2 `range()`:
```
Median time cost:
List[int], FastPrimitiveCoder : per element median time cost:
1.17695e-07 sec
```
With `builtins.range():`
```
Median time cost:
List[int], FastPrimitiveCoder : per element median time cost:
3.22402e-07 sec
```
We should try to use microbenchmarks for performance evaluations moving
forward since they can provide feedback in a matter of seconds.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 109566)
Time Spent: 19h 10m (was: 19h)
> Futurize and fix python 2 compatibility for coders package
> ----------------------------------------------------------
>
> Key: BEAM-3981
> URL: https://issues.apache.org/jira/browse/BEAM-3981
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Robbe
> Assignee: Robbe
> Priority: Major
> Fix For: Not applicable
>
> Time Spent: 19h 10m
> Remaining Estimate: 0h
>
> Run automatic conversion with futurize tool on coders subpackage and fix
> python 2 compatibility. This prepares the subpackage for python 3 support.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)