[ 
https://issues.apache.org/jira/browse/BEAM-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17139101#comment-17139101
 ] 

Robert Burke commented on BEAM-7009:
------------------------------------

[~lcwik]
TIL that the ISO_8859_1 encoded examples are not encoded as ISO_8859_1 strings. 
They're ordinary bytes, *interpreted* as ISO_8859_1, and written out. That way. 
The Java String constructor with a charset takes in bytes, and the specified 
way to interpret the bytes, not the way one wishes to encode the bytes. To 
extract ISO_8859_1 bytes from a Java string one must wrap the bytes normally, 
and use the getBytes(charset). The API is being used backwards.

I was converting the raw bytes to UTF8 by decoding a ISO_8859_1 representation 
of the strings, but instead, I need to "encode" the read in bytes to 
ISO_8859_1, in order to get the native go representation (UTF8).

I'm assuming this is a mistake, as it's not clear to me why one would do that 
intentionally.

This explains why I'm seeing the python code "encode" everything instead of 
"decode" everything, which is very confusing.

> Add Go SDK test for Standard Coders using the yaml data.
> --------------------------------------------------------
>
>                 Key: BEAM-7009
>                 URL: https://issues.apache.org/jira/browse/BEAM-7009
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model, sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: P1
>              Labels: stale-P2
>
> The Go SDK doesn't currently do this validation, against the standard yaml 
> file. [1]
> The Java and Python equivalents of the test can be found from here [2].
>  
> Care would need to be taken so that Beam Go SDK users (such as they are) 
> aren't forced to run them, and not have the yaml file to read. I'd suggest 
> putting it with the integration tests [3].
> The other thing of note is that the Go SDK has no notion of "nested" vs 
> "unnested" coders. All coders are "nested" in the Go SDK, and should have 
> their lengths prefixed to them as appropriate.
> 1: 
> [https://github.com/apache/beam/blob/master/model/fn-execution/src/main/resources/org/apache/beam/model/fnexecution/v1/standard_coders.yaml]
> 2: 
> [https://github.com/apache/beam/search?q=standard_coders.yaml&unscoped_q=standard_coders.yaml]
> 3: [https://github.com/apache/beam/tree/master/sdks/go/test 
> |https://github.com/apache/beam/tree/master/sdks/go/test]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to