[
https://issues.apache.org/jira/browse/BEAM-7326?focusedWorklogId=261969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-261969
]
ASF GitHub Bot logged work on BEAM-7326:
----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Jun/19 02:19
Start Date: 18/Jun/19 02:19
Worklog Time Spent: 10m
Work Description: tvalentyn commented on pull request #8873: [BEAM-7326]
add documentation bigquery data types
URL: https://github.com/apache/beam/pull/8873#discussion_r294585292
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
##########
@@ -142,6 +143,34 @@
* <li>[{@code dataset_id}].[{@code table_id}]
* </ul>
*
+ * <h3>BigQuery Concepts</h3>
+ *
+ * <p>Tables have rows ({@link TableRow}) and each row has cells ({@link
TableCell}). A table has a
+ * schema ({@link TableSchema}), which in turn describes the schema of each
cell ({@link
+ * TableFieldSchema}). The terms field and cell are used interchangeably.
+ *
+ * <p>{@link TableSchema}: describes the schema (types and order) for values
in each row. It has one
+ * attribute, ‘fields’, which is list of {@link TableFieldSchema} objects.
+ *
+ * <p>{@link TableFieldSchema}: describes the schema (type, name) for one
field. It has several
+ * attributes, including 'name' and 'type'. Common values for the type
attribute are: 'STRING',
+ * 'INTEGER', 'FLOAT', 'BOOLEAN', 'NUMERIC', 'GEOGRAPHY'. All possible values
are described at: <a
+ *
href="https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types">
+ * https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types</a>
+ *
+ * <p>{@link TableRow}: Holds all values in a table row. Has one attribute,
'f', which is a list of
+ * {@link TableCell} instances.
+ *
+ * <p>{@link TableCell}: Holds the value for one cell (or field). Has one
attribute, 'v', which is
+ * the value of the table cell.
+ *
+ * <p>As of Beam 2.7.0, the NUMERIC data type is supported. This data type
supports high-precision
+ * decimal numbers (precision of 38 digits, scale of 9 digits). The GEOGRAPHY
data type works with
+ * Well-Known Text (See <a
href="https://en.wikipedia.org/wiki/Well-known_text">
+ * https://en.wikipedia.org/wiki/Well-known_text</a>) format for reading and
writing to BigQuery.
+ * The BYTES data type requires that bytes are encoded using base64 encoding
when writing to
Review comment:
How about the following wording:
```BigQuery IO requires values of BYTES datatype to be encoded using base64
encoding when writing to BigQuery. When bytes are read from BigQuery they are
returned as base64-encoded strings.```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 261969)
Time Spent: 40m (was: 0.5h)
> Document that Beam BigQuery IO expects users to pass base64-encoded bytes,
> and BQ IO serves base64-encoded bytes to the user.
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-7326
> URL: https://issues.apache.org/jira/browse/BEAM-7326
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp, io-python-gcp
> Reporter: Valentyn Tymofieiev
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache
> Beam BigQuery IO connector.
> Current implementation of BigQuery connector in Java and Python SDKs expects
> that users base64-encode bytes before passing them to BigQuery IO, see
> discussion on dev: [1]
> This needs to be reflected in public documentation, see [2-4]
> cc: [~juta] [~chamikara] [~pabloem]
> cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be
> done for Go SDK and/or Beam SQL.
> [1]
> https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/
> [3]
> https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html
> [4]
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)