[ 
https://issues.apache.org/jira/browse/BEAM-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347803#comment-17347803
 ] 

Matteo Martignon commented on BEAM-7826:
----------------------------------------

Same happens for specific Spanish characters characters like 'ñ' or accents 
'á','é'. Dataflow workers default charset is US-ASCII.
{{Method }}{{getBytes(StandardCharsets.UTF_8) does not actually returns byte 
array UTF-8 encoded.}}
 
{{Example:}}[https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L33|https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L333]
[|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13262858]

> Problem loading ISO-8859-1 into BigQuery using DataFlow
> -------------------------------------------------------
>
>                 Key: BEAM-7826
>                 URL: https://issues.apache.org/jira/browse/BEAM-7826
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp, io-java-text
>    Affects Versions: 2.8.0
>            Reporter: Israel Gómez
>            Priority: P3
>
> Hi all,
> I'm trying to load an ISO-8859-1 file into BigQuery using DataFlow. I've 
> built a template with Apache Beam Java. Everything works well but when I 
> check the content of the Bigquery table I see that some characters like 'ñ' 
> or accents 'á','é', etc. haven't been stored propertly, they have been stored 
> as �.
> I've tried several charset changing before write into BigQuery. Also, I've 
> created a special ISOCoder passed to the pipeline using the method 
> setCoder(), but nothing works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to