[ 
https://issues.apache.org/jira/browse/BEAM-3981?focusedWorklogId=109566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109566
 ]

ASF GitHub Bot logged work on BEAM-3981:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Jun/18 22:08
            Start Date: 06/Jun/18 22:08
    Worklog Time Spent: 10m 
      Work Description: tvalentyn commented on issue #5053: [BEAM-3981] 
Futurize coders subpackage
URL: https://github.com/apache/beam/pull/5053#issuecomment-395229849
 
 
   An update on `dict.iteritems` vs `dict.items()` vs 
`future.utils.iteritems(dict)` - I did more performance testing of 
encode-decode operation using a microbenchmark (currently in flight: 
https://github.com/apache/beam/pull/5565). 
   
   I don't observe a difference in performance of `dict.iteritems()` and 
`future.utils.iteritems(dict)`. 
   
   As far as `dict.items()` vs `dict.iteritems()` goes, I saw a 2x performance 
slowdown in coder implementation with dict.items() for dictionaries with over 
100000 entries, but did not observe a significant difference on dictionaries 
with 10000 entries or less. That said I think it would not hurt to keep using 
`iteritems()` for Python 2 as we do now.
   
   With `future.utils.iteritems()`:
   
   ```
   Median time cost:
   Dict[int, int], FastPrimitiveCoder         : per element median time cost: 
3.27529e-07 sec                                                  
   ```
   With `dict.iteritems()`:
   ```
   Median time cost:
   Dict[int, int], FastPrimitiveCoder         : per element median time cost: 
3.4485e-07 sec
   ```
   With `dict.items()`:
   
   ```
   Median time cost:
   Dict[int, int], FastPrimitiveCoder         : per element median time cost: 
7.3393e-07 sec
   ```
   
   I also observe a 2.5x degradation in coder implementation with 
`builtins.range()` compared to `range()` on lists as small as 1000 - 10000 
elements. I did not try smaller lists.
   
   With 10000 elements, python 2 `range()`:
   ```
   Median time cost:
   List[int], FastPrimitiveCoder              : per element median time cost: 
1.17695e-07 sec
   ```
   
   With `builtins.range():`
   ```
   Median time cost:
   List[int], FastPrimitiveCoder              : per element median time cost: 
3.22402e-07 sec
   ```
   
   We should try to use microbenchmarks for performance evaluations moving 
forward since they can provide feedback in a matter of seconds.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 109566)
    Time Spent: 19h 10m  (was: 19h)

> Futurize and fix python 2 compatibility for coders package
> ----------------------------------------------------------
>
>                 Key: BEAM-3981
>                 URL: https://issues.apache.org/jira/browse/BEAM-3981
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Robbe
>            Assignee: Robbe
>            Priority: Major
>             Fix For: Not applicable
>
>          Time Spent: 19h 10m
>  Remaining Estimate: 0h
>
> Run automatic conversion with futurize tool on coders subpackage and fix 
> python 2 compatibility. This prepares the subpackage for python 3 support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to