[
https://issues.apache.org/jira/browse/BEAM-10785?focusedWorklogId=777317&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777317
]
ASF GitHub Bot logged work on BEAM-10785:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 02/Jun/22 07:23
Start Date: 02/Jun/22 07:23
Worklog Time Spent: 10m
Work Description: harrydrippin commented on PR #17518:
URL: https://github.com/apache/beam/pull/17518#issuecomment-1144535142
@pabloem The problem in my case was occurred when I was processing the chat
data including emojis and putting it into BigQuery (they were all replaced to
replacement character), so our major need in this problem was to disable
`ensure_ascii` from `True` to `False` on `json.dumps()`. But there was no
exposed control for replacing that argument, so I temporarily customized
`RowAsDictJsonCoder` and `WriteToBigQuery` in my environment like below:
```python
class CustomRowAsDictJsonCoder(coders.Coder):
def encode(self, table_row):
try:
# ...
return json.dumps(table_row, ensure_ascii=False,
default=default_encoder).encode("utf-8")
# ------------------
# except: ...
```
I also prefer to not define any additional parameters if possible, but I
thought that we don't have any possible way to modify parameters inside the
coder, or replace the coder. Please correct me if you have any concern over
this.
Issue Time Tracking
-------------------
Worklog Id: (was: 777317)
Time Spent: 1h 50m (was: 1h 40m)
> Support for coder argument in WriteToBigQuery
> ---------------------------------------------
>
> Key: BEAM-10785
> URL: https://issues.apache.org/jira/browse/BEAM-10785
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Nakamura Yu
> Assignee: Seunghwan Hong
> Priority: P1
> Time Spent: 1h 50m
> Remaining Estimate: 0h
>
> When using WriteToBigQuery to transfer data to BigQuery, non-ascii characters
> are replaced with replacement characters.
> This was due to the RowAsDictJsonCoder being set as the coder for the
> BigQueryBatchFileLoads called inside WriteToBigQuery.
> I want to add coder to the argument of WriteToBigQuery so that I can set a
> coder other than RowAsDictJsonCoder.
> If no problem, I will create a Pull Request next weekend.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)