[
https://issues.apache.org/jira/browse/BEAM-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347803#comment-17347803
]
Matteo Martignon commented on BEAM-7826:
----------------------------------------
Same happens for specific Spanish characters characters like 'ñ' or accents
'á','é'. Dataflow workers default charset is US-ASCII.
{{Method }}{{getBytes(StandardCharsets.UTF_8) does not actually returns byte
array UTF-8 encoded.}}
{{Example:}}[https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L33|https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/d2b43a5a19a1484833ea13761e6843b5b7d3328f/src/main/java/com/google/cloud/teleport/templates/common/BigQueryConverters.java#L333]
[|https://issues.apache.org/jira/secure/AddComment!default.jspa?id=13262858]
> Problem loading ISO-8859-1 into BigQuery using DataFlow
> -------------------------------------------------------
>
> Key: BEAM-7826
> URL: https://issues.apache.org/jira/browse/BEAM-7826
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp, io-java-text
> Affects Versions: 2.8.0
> Reporter: Israel Gómez
> Priority: P3
>
> Hi all,
> I'm trying to load an ISO-8859-1 file into BigQuery using DataFlow. I've
> built a template with Apache Beam Java. Everything works well but when I
> check the content of the Bigquery table I see that some characters like 'ñ'
> or accents 'á','é', etc. haven't been stored propertly, they have been stored
> as �.
> I've tried several charset changing before write into BigQuery. Also, I've
> created a special ISOCoder passed to the pipeline using the method
> setCoder(), but nothing works.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)