[PR] BigQueryIO uniformize direct and export reads [beam]

via GitHub Thu, 29 Aug 2024 06:14:01 -0700


RustedBones opened a new pull request, #32360:
URL: https://github.com/apache/beam/pull/32360


   Refers to #26329, also fix #20100
   
   When using `readWithDatumReader` and `DIRECT_READ` method, the transform 
would fail becase the `parseFn` is expected. Refactor the IO so the avro 
`datumReader` can be use in both cases.
   
   In some case, it is required to get the data with the desired schema. 
Currently, BQ io always uses the writer schema (or table schema). Create new 
APIs to set the reader schema.
   
   This refactoring contains some breaking changes:
   
   `withFormat` is not exposed anymore. Indeed, it is not possible to configure 
a `TypedRead` with a `DatumReaderFactory` and change the format later. Data 
format MUST be chosen when creating the transform.
   
   In the `TypedRead.Builder`, replace the `DatumReaderFactory` with the 
`BigQueryReaderFactory` allowing to handle both avro and arrow in uniform 
fashion. This alters the `BigQueryIOTranslation`.
   I need some help on that point to better handle that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] BigQueryIO uniformize direct and export reads [beam]

Reply via email to