[ 
https://issues.apache.org/jira/browse/BEAM-7326?focusedWorklogId=261971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261971
 ]

ASF GitHub Bot logged work on BEAM-7326:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jun/19 02:19
            Start Date: 18/Jun/19 02:19
    Worklog Time Spent: 10m 
      Work Description: tvalentyn commented on pull request #8873: [BEAM-7326] 
add documentation bigquery data types
URL: https://github.com/apache/beam/pull/8873#discussion_r294586205
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##########
 @@ -224,6 +224,9 @@ def compute_table_name(row):
 The GEOGRAPHY data type works with Well-Known Text (See
 https://en.wikipedia.org/wiki/Well-known_text) format for reading and writing
 to BigQuery.
+The BYTES data type requires that bytes are encoded using base64 encoding when
 
 Review comment:
   How about the following wording: 
   ```BigQuery IO requires values of BYTES datatype to be encoded using base64 
encoding when writing to BigQuery. When bytes are read from BigQuery they are 
returned as base64-encoded bytes.``` 
   note: looks like in Java bytes are returned as strings, but in Python as 
bytes, so there is a difference with Java wording.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 261971)
    Time Spent: 1h  (was: 50m)

> Document that Beam BigQuery IO expects users to pass base64-encoded bytes, 
> and BQ IO serves base64-encoded bytes to the user.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-7326
>                 URL: https://issues.apache.org/jira/browse/BEAM-7326
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp, io-python-gcp
>            Reporter: Valentyn Tymofieiev
>            Priority: Major
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache 
> Beam BigQuery IO connector.
> Current implementation of BigQuery connector in Java and Python SDKs expects 
> that users base64-encode bytes before passing them to BigQuery IO, see 
> discussion on dev: [1] 
> This needs to be reflected in public documentation, see [2-4]
> cc: [~juta] [~chamikara] [~pabloem] 
> cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be 
> done for Go SDK and/or Beam SQL.
> [1] 
> https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/
> [3] 
> https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html
> [4] 
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to