Re: [PR] feat: add ipc.NewRecordBatchWriter [arrow-go]

via GitHub Wed, 25 Jun 2025 10:27:32 -0700


alvarowolfx commented on PR #421:
URL: https://github.com/apache/arrow-go/pull/421#issuecomment-3005596960


   > Can't this logic just shift to only do the check on the Schema message 
then?
   
   But the schema message would go on every request, not sure if I follow the 
logic here. The idea is to avoid sending the schema on every request. 
   
   To add another point, Pyarrow allows for reading the schema and 
recordbatches separately in IPC format:
   https://cloud.google.com/bigquery/docs/write-api-streaming#arrow-format
   ```
   from google.cloud.bigquery_storage_v1 import types as gapic_types
   from google.cloud.bigquery_storage_v1.writer import AppendRowsStream
   from google.cloud import bigquery_storage_v1
   
   def append_rows_with_pyarrow(
     pyarrow_table: pyarrow.Table,
     project_id: str,
     dataset_id: str,
     table_id: str,
   ):
     bqstorage_write_client = bigquery_storage_v1.BigQueryWriteClient()
   
     # Create request_template.
     request_template = gapic_types.AppendRowsRequest()
     request_template.write_stream = (
         
f"projects/{project_id}/datasets/{dataset_id}/tables/{table_id}/_default"
     )
     arrow_data = gapic_types.AppendRowsRequest.ArrowData()
     arrow_data.writer_schema.serialized_schema = (
         pyarrow_table.schema.serialize().to_pybytes()
     )
     request_template.arrow_rows = arrow_data
   
     # Create AppendRowsStream.
     append_rows_stream = AppendRowsStream(
         bqstorage_write_client,
         request_template,
     )
   
     # Create request with table data.
     request = gapic_types.AppendRowsRequest()
     request.arrow_rows.rows.serialized_record_batch = (
         pyarrow_table.to_batches()[0].serialize().to_pybytes()
     )
   
     # Send request.
     future = append_rows_stream.send(request)
   
     # Wait for result.
     future.result()
     ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: add ipc.NewRecordBatchWriter [arrow-go]

Reply via email to