[
https://issues.apache.org/jira/browse/BEAM-7008?focusedWorklogId=223474&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-223474
]
ASF GitHub Bot logged work on BEAM-7008:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Apr/19 07:27
Start Date: 05/Apr/19 07:27
Worklog Time Spent: 10m
Work Description: robertwb commented on issue #8228: [BEAM-7008]
standardize UTF-8 string coder encodings
URL: https://github.com/apache/beam/pull/8228#issuecomment-480175987
FYI, the current implementation uses
https://github.com/apache/beam/blob/release-2.12.0/sdks/python/apache_beam/coders/coder_impl.py#L199
to prefix with the length iff the context is nested.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 223474)
Time Spent: 40m (was: 0.5h)
> standardize UTF-8 string coder encodings
> ----------------------------------------
>
> Key: BEAM-7008
> URL: https://issues.apache.org/jira/browse/BEAM-7008
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core, sdk-py-core
> Reporter: Heejong Lee
> Assignee: Heejong Lee
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> It looks like UTF-8 String Coder in Java and Python SDKs uses different
> encoding schemes. StringUtf8Coder in Java SDK puts the varint length of the
> input string before actual data bytes however StrUtf8Coder in Python SDK
> directly encodes the input string to bytes value. We should unify the
> encoding schemes of UTF8 strings across the different SDKs and make it a
> standard coder.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)