[ 
https://issues.apache.org/jira/browse/BEAM-7121?focusedWorklogId=238919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-238919
 ]

ASF GitHub Bot logged work on BEAM-7121:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 08/May/19 00:52
            Start Date: 08/May/19 00:52
    Worklog Time Spent: 10m 
      Work Description: yifanmai commented on issue #8377: [BEAM-7121] Add 
deterministic proto coder
URL: https://github.com/apache/beam/pull/8377#issuecomment-490306076
 
 
   The deterministic coder is 20% slower with 10 keys, 65% slower with 100 
keys, and 70% slower with 1000 keys.
   
   Should we keep the deterministic coder separate then?
   
   Benchmark setup:
   ```
   def benchmark_proto_coder(num_keys, deterministic):
     setup = (
         "import random\n"
         "from apache_beam.coders import coders\n"
         "from apache_beam.coders import "
         "proto2_coder_test_messages_pb2 as test_message\n"
         "message = test_message.MessageWithMap()\n"
         "coder = coders.ProtoCoder(message.__class__, deterministic=%s)\n"
         "keys = range(%d)\n"
         "random.shuffle(keys)\n"
         "for i in keys:\n"
         "    message.field1['key_%%s' %% i].field1 = "
         "'Hello world %%s' %% i") % (deterministic, num_keys)
     timings = timeit.repeat(
         'coder.encode(message)', setup=setup, number=10000, repeat=5)
     print(
         'Benchmark timings for (num_keys=%d, deterministic=%s): %s' %
         (num_keys, deterministic, timings))```
   ```
   Benchmark results:
   
   ```
   Benchmark timings for (num_keys=10, deterministic=False): 
[0.11162495613098145, 0.11867713928222656, 0.09691786766052246, 
0.09254908561706543, 0.0923910140991211]
   Benchmark timings for (num_keys=10, deterministic=True): 
[0.12508201599121094, 0.12107491493225098, 0.1263580322265625, 
0.12426280975341797, 0.12407898902893066]
   Benchmark timings for (num_keys=100, deterministic=False): 
[0.765225887298584, 0.7647609710693359, 0.7648091316223145, 0.7889750003814697, 
0.794219970703125]
   Benchmark timings for (num_keys=100, deterministic=True): 
[1.1826050281524658, 1.2132699489593506, 1.3479349613189697, 1.328895092010498, 
1.3238089084625244]
   Benchmark timings for (num_keys=1000, deterministic=False): 
[7.4996819496154785, 7.6137168407440186, 7.3893210887908936, 7.379233121871948, 
7.4038591384887695]
   Benchmark timings for (num_keys=1000, deterministic=True): 
[12.842539072036743, 12.79149603843689, 12.759114027023315, 12.604343891143799, 
13.032737970352173]
   ```
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 238919)
    Time Spent: 2h  (was: 1h 50m)

> Provide deterministic version of Python's ProtoCoder
> ----------------------------------------------------
>
>                 Key: BEAM-7121
>                 URL: https://issues.apache.org/jira/browse/BEAM-7121
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-core
>            Reporter: Yifan Mai
>            Priority: Minor
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Passing deterministic=true to proto's 
> [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204]
>  will result in deterministic encoding of maps in protos. This can be used to 
> provide a deterministic version of ProtoCoder.
> This would allow protos to be used as a key for grouping by key.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to