rszper commented on code in PR #29507: URL: https://github.com/apache/beam/pull/29507#discussion_r1414403139
########## learning/prompts/documentation-lookup/01_basic_learning_apache_beam.md: ########## @@ -0,0 +1,14 @@ +Prompt: +What is Apache Beam? +Response: +Apache Beam is an open-source unified programming model for data processing pipelines. It provides a set of language-specific SDKs that allow you to define and execute data processing workflows. +Apache Beam allows you to write pipelines in one of the supported languages (Java, Python, Go, and Scala). The pipelines can be executed on one of the supported runners (Apache Flink, Apache Spark, and Google Cloud Dataflow). Review Comment: Do we need line breaks between these paragraphs? ########## learning/prompts/documentation-lookup/20_io_biguery.md: ########## @@ -0,0 +1,39 @@ +Prompt: +Is BigQuery supported in Apache Beam? +Response: +Yes, Apache Beam supports BigQuery. [BigQuery](https://cloud.google.com/bigquery) is a serverless and cost-effective enterprise data warehouse offered by [Google Cloud](https://cloud.google.com/). Apache Beam provides a `BigQueryIO` connector to read and write data to and from BigQuery. The `BigQueryIO` connector supports both batch and streaming pipelines. + +The following Apache Beam SDKs support the `BigQueryIO` connector: +* [Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html) +* [Python](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html) +* [Go](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio) native and though [X Language](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/io/xlang/bigqueryio) +* [Typescript](https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/io/bigqueryio.ts) through X Language + +To read data from BigQuery, use `ReadFromBigQuery`. Apache Beam can read data directly from a BigQuery table or using a SQL query. The default mode is to return table rows read from a BigQuery source as dictionaries. Built-in `TableRow` objects can also be returned. + +Apache Beam pipeline code for reading from BigQuery might look like the following example: + +```python +from apache_beam.io.gcp.bigquery import ReadFromBigQuery + +with beam.Pipeline(options=options) as p: + # read from a table + lines_table = p | 'Read' >> ReadFromBigQuery(table=table) + # read from a query + lines_query = p | 'Read' >> ReadFromBigQuery(query="SELECT * FROM table") + +``` +Apache Beam pipeline code for writing from BigQuery might look like the following example: + +```python +from apache_beam.io.gcp.bigquery import WriteToBigQuery + +with beam.Pipeline(options=options) as p: + # write to a table + p | 'Write' >> beam.io.WriteToBigQuery( + table, + schema=TABLE_SCHEMA, + create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED, + write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND) +``` +To get started with BigQuery and Apache Beam, see the examples in the [Dataflow-cookbook repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook) in GitHub. For Python read examples, see the [read_table_bigquery.py](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/read_table_bigquery.py) example. For Python write examples, see the [write_bigquery.py](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/write_bigquery.py) example. Review Comment: ```suggestion To get started with BigQuery and Apache Beam, see the examples in the [Dataflow cookbook repository](https://github.com/GoogleCloudPlatform/dataflow-cookbook) in GitHub. For Python read examples, see the [read_table_bigquery.py](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/read_table_bigquery.py) example. For Python write examples, see the [write_bigquery.py](https://github.com/GoogleCloudPlatform/dataflow-cookbook/blob/main/Python/bigquery/write_bigquery.py) example. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
