[
https://issues.apache.org/jira/browse/BEAM-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786293#comment-16786293
]
Valentyn Tymofieiev edited comment on BEAM-6769 at 3/7/19 1:52 AM:
-------------------------------------------------------------------
1. Per
[https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bytes-type,]
BQ supports Bytes, so Beam BQ IO also should support bytes, or call out that
we don't support Bytes.
2. We should find whether ApiTools BQ client that Beam uses, can accept raw
bytes. If it does, we should find out how to correctly pass raw bytes from the
user to the all the way to BQ client. We may have to avoid json.dumps in this
codepath or do a workaround like: bytes -> base64 encode -> decode to str
using 'ascii' -> json.dumps -> encode from str using ascii, decode from base64
-> pass to BQ client. This may potentially have performance implications.
We should have a test that takes a non-decodable byte-string, such as
b'\xab\xac\xad', and make sure we can store and retrieve it without accidental
decoding by Beam, BQ or BQ client.
cc: [~pabloem], [~chamikara], [~altay]
was (Author: tvalentyn):
1. Per
[https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bytes-type,]
BQ supports Bytes, so Beam BQ IO also should support bytes, or call out that
we don't support Bytes.
2. We should find whether ApiTools BQ client that Beam uses, can accept raw
bytes. If it does, we should find out how to correctly pass raw bytes from the
user to the all the way to BQ client. We may have to avoid json.dumps in this
codepath or do a workaround like: bytes -> base64 encode -> encode to str
using 'ascii' -> json.dumps -> decode from str, decode from base64 -> pass to
BQ client. This may potentially have performance implications.
We should have a test that takes a non-decodable byte-string, such as
b'\xab\xac\xad', and make sure we can store and retrieve it without accidental
decoding by Beam, BQ or BQ client.
cc: [~pabloem], [~chamikara], [~altay]
> Write bytes to BigQuery in Python 3
> -----------------------------------
>
> Key: BEAM-6769
> URL: https://issues.apache.org/jira/browse/BEAM-6769
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Juta Staes
> Assignee: Juta Staes
> Priority: Minor
>
> In Python 2 you could write bytes data to BigQuery. This is tested in
>
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L186]
> Python 3 does not support
> {noformat}
> json.dumps({'test': b'test'}){noformat}
> which is used to encode the data in
>
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L959]
>
> How should writing bytes to BigQuery be handled in Python 3?
> * Forbid writing bytes into BigQuery on Python 3
> * Guess the encoding (utf-8?)
> * Pass the encoding to BigQuery
> cc: [~tvalentyn]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)