alvarowolfx commented on PR #421: URL: https://github.com/apache/arrow-go/pull/421#issuecomment-3005596960
> Can't this logic just shift to only do the check on the Schema message then? But the schema message would go on every request, not sure if I follow the logic here. The idea is to avoid sending the schema on every request. To add another point, Pyarrow allows for reading the schema and recordbatches separately in IPC format: https://cloud.google.com/bigquery/docs/write-api-streaming#arrow-format ``` from google.cloud.bigquery_storage_v1 import types as gapic_types from google.cloud.bigquery_storage_v1.writer import AppendRowsStream from google.cloud import bigquery_storage_v1 def append_rows_with_pyarrow( pyarrow_table: pyarrow.Table, project_id: str, dataset_id: str, table_id: str, ): bqstorage_write_client = bigquery_storage_v1.BigQueryWriteClient() # Create request_template. request_template = gapic_types.AppendRowsRequest() request_template.write_stream = ( f"projects/{project_id}/datasets/{dataset_id}/tables/{table_id}/_default" ) arrow_data = gapic_types.AppendRowsRequest.ArrowData() arrow_data.writer_schema.serialized_schema = ( pyarrow_table.schema.serialize().to_pybytes() ) request_template.arrow_rows = arrow_data # Create AppendRowsStream. append_rows_stream = AppendRowsStream( bqstorage_write_client, request_template, ) # Create request with table data. request = gapic_types.AppendRowsRequest() request.arrow_rows.rows.serialized_record_batch = ( pyarrow_table.to_batches()[0].serialize().to_pybytes() ) # Send request. future = append_rows_stream.send(request) # Wait for result. future.result() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
