[
https://issues.apache.org/jira/browse/BEAM-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139101#comment-17139101
]
Robert Burke commented on BEAM-7009:
------------------------------------
[~lcwik]
TIL that the ISO_8859_1 encoded examples are not encoded as ISO_8859_1 strings.
They're ordinary bytes, *interpreted* as ISO_8859_1, and written out. That way.
The Java String constructor with a charset takes in bytes, and the specified
way to interpret the bytes, not the way one wishes to encode the bytes. To
extract ISO_8859_1 bytes from a Java string one must wrap the bytes normally,
and use the getBytes(charset). The API is being used backwards.
I was converting the raw bytes to UTF8 by decoding a ISO_8859_1 representation
of the strings, but instead, I need to "encode" the read in bytes to
ISO_8859_1, in order to get the native go representation (UTF8).
I'm assuming this is a mistake, as it's not clear to me why one would do that
intentionally.
This explains why I'm seeing the python code "encode" everything instead of
"decode" everything, which is very confusing.
> Add Go SDK test for Standard Coders using the yaml data.
> --------------------------------------------------------
>
> Key: BEAM-7009
> URL: https://issues.apache.org/jira/browse/BEAM-7009
> Project: Beam
> Issue Type: Bug
> Components: beam-model, sdk-go
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: P1
> Labels: stale-P2
>
> The Go SDK doesn't currently do this validation, against the standard yaml
> file. [1]
> The Java and Python equivalents of the test can be found from here [2].
>
> Care would need to be taken so that Beam Go SDK users (such as they are)
> aren't forced to run them, and not have the yaml file to read. I'd suggest
> putting it with the integration tests [3].
> The other thing of note is that the Go SDK has no notion of "nested" vs
> "unnested" coders. All coders are "nested" in the Go SDK, and should have
> their lengths prefixed to them as appropriate.
> 1:
> [https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml]
> 2:
> [https://github.com/apache/beam/search?q=standard_coders.yaml&unscoped_q=standard_coders.yaml]
> 3: [https://github.com/apache/beam/tree/master/sdks/go/test
> |https://github.com/apache/beam/tree/master/sdks/go/test]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)