[ 
https://issues.apache.org/jira/browse/BEAM-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786293#comment-16786293
 ] 

Valentyn Tymofieiev edited comment on BEAM-6769 at 3/7/19 1:52 AM:
-------------------------------------------------------------------

1. Per 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bytes-type,]
 BQ supports Bytes, so Beam BQ IO also should support bytes, or call out that 
we don't support Bytes.

2. We should find whether ApiTools BQ client that Beam uses, can accept raw 
bytes. If it does, we should find out how to correctly pass raw bytes from the 
user to the all the way to BQ client. We may have to avoid json.dumps in this 
codepath or do a workaround like: bytes -> base64 encode ->  decode to str 
using 'ascii' -> json.dumps ->  encode from str using ascii, decode from base64 
-> pass to BQ client. This may potentially have performance implications.

We should have a test that takes a non-decodable byte-string, such as 
b'\xab\xac\xad', and make sure we can store and retrieve it without accidental 
decoding by Beam, BQ or BQ client.

cc: [~pabloem], [~chamikara], [~altay]


was (Author: tvalentyn):
1. Per 
[https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bytes-type,]
 BQ supports Bytes, so Beam BQ IO also should support bytes, or call out that 
we don't support Bytes.

2. We should find whether ApiTools BQ client that Beam uses, can accept raw 
bytes. If it does, we should find out how to correctly pass raw bytes from the 
user to the all the way to BQ client. We may have to avoid json.dumps in this 
codepath or do a workaround like: bytes -> base64 encode ->  encode to str 
using 'ascii' -> json.dumps -> decode from str, decode from base64 -> pass to 
BQ client. This may potentially have performance implications.

We should have a test that takes a non-decodable byte-string, such as 
b'\xab\xac\xad', and make sure we can store and retrieve it without accidental 
decoding by Beam, BQ or BQ client.

cc: [~pabloem], [~chamikara], [~altay]

> Write bytes to BigQuery in Python 3
> -----------------------------------
>
>                 Key: BEAM-6769
>                 URL: https://issues.apache.org/jira/browse/BEAM-6769
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Juta Staes
>            Assignee: Juta Staes
>            Priority: Minor
>
> In Python 2 you could write bytes data to BigQuery. This is tested in
>  
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L186]
> Python 3 does not support
> {noformat}
> json.dumps({'test': b'test'}){noformat}
> which is used to encode the data in
>  
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L959]
>  
> How should writing bytes to BigQuery be handled in Python 3?
>  * Forbid writing bytes into BigQuery on Python 3
>  * Guess the encoding (utf-8?)
>  * Pass the encoding to BigQuery
> cc: [~tvalentyn]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to