[ https://issues.apache.org/jira/browse/BEAM-5720?focusedWorklogId=153847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-153847 ]
ASF GitHub Bot logged work on BEAM-5720: ---------------------------------------- Author: ASF GitHub Bot Created on: 12/Oct/18 09:40 Start Date: 12/Oct/18 09:40 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #6659: [BEAM-5720] Fix encoding of large python ints in Python 3. URL: https://github.com/apache/beam/pull/6659#discussion_r224728870 ########## File path: sdks/python/apache_beam/coders/coder_impl.py ########## @@ -293,8 +299,21 @@ def encode_to_stream(self, value, stream, nested): if value is None: stream.write_byte(NONE_TYPE) elif t is int: - stream.write_byte(INT_TYPE) - stream.write_var_int64(value) + # In Python 3, an int may be larger than 64 bits. + # Note that an OverflowError on stream.write_var_int64 would happen + # *after* the marker byte is written, so we must check earlier. + try: + # This may throw an overflow error when compiled. + int_value = value + # Otherwise, we must check ourselves. + if not is_compiled: + if not fits_in_64_bits(value): Review comment: Yep. Current code: ``` small_int, FastPrimitivesCoder, 1000 element(s): per element median time cost: 1.1301e-07 sec, relative std: 19.00% ``` Removing the is_compiled check ``` small_int, FastPrimitivesCoder, 1000 element(s): per element median time cost: 1.88589e-07 sec, relative std: 18.08% ``` That's over a 60% increase. I tried inlining it and using constants rather than computing the bounds each time, which helps some but the check is entirely redundant on compiled code. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 153847) Time Spent: 50m (was: 40m) > Default coder breaks with large ints on Python 3 > ------------------------------------------------ > > Key: BEAM-5720 > URL: https://issues.apache.org/jira/browse/BEAM-5720 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core > Reporter: Robert Bradshaw > Assignee: Valentyn Tymofieiev > Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The test for `int` includes greater than 64-bit values, which causes an > overflow error later in the code. We need to only use that coding scheme for > machine-sized ints. -- This message was sent by Atlassian JIRA (v7.6.3#76005)