pabloem commented on code in PR #21872: URL: https://github.com/apache/beam/pull/21872#discussion_r908773930
########## sdks/python/apache_beam/io/gcp/bigquery.py: ########## @@ -235,53 +235,39 @@ def compute_table_name(row): [2] https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert [3] https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#resource -Output of the WriteToBigQuery transform +Chaining of operations after WriteToBigQuery --------------------------------------- +WritToBigQuery returns an object with several PCollections that consist of +metadata about the write operations. These are useful to inspect the write +operation and follow up on the results. Often, the simplest use case is to +chain an operation after writing data to BigQuery. -Writing to BigQuery returns a WriteResult object that includes metadata -relating to the write you configured. This data can be used in later steps -in your pipeline::: - - schema = {'fields': [ - {'name': 'column', 'type': 'STRING', 'mode': 'NULLABLE'}]} - - error_schema = {'fields': [ - {'name': 'destination', 'type': 'STRING', 'mode': 'NULLABLE'}, - {'name': 'row', 'type': 'STRING', 'mode': 'NULLABLE'}, - {'name': 'error_message', 'type': 'STRING', 'mode': 'NULLABLE'}]} - - with Pipeline() as p: - result = (p - | 'Create Columns' >> beam.Create([ - {'column': 'value'}, - {'bad_column': 'bad_value'} - ]) - | 'Write Data' >> WriteToBigQuery( - method=WriteToBigQuery.Method.STREAMING_INSERTS, - table=my_table, - schema=schema, - insert_retry_strategy=RetryStrategy.RETRY_NEVER - )) - - _ = (result.failed_rows_with_errors - | 'Get Errors' >> beam.Map(lambda e: { - "destination": e[0], - "row": json.dumps(e[1]), - "error_message": e[2][0]['message'] - }) - | 'Write Errors' >> WriteToBigQuery( - method=WriteToBigQuery.Method.STREAMING_INSERTS, - table=error_log_table, - schema=error_schema, - )) Review Comment: You don't need to remove this example. I quite like it. Up to you though : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
