tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom
BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r398232108
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
##########
@@ -236,11 +251,12 @@ def create_table(cls, table_name):
cls.bigquery_client.insert_rows(
cls.project, cls.dataset_id, table_name, table_data)
- def get_expected_data(self):
+ def get_expected_data(self, native=True):
+ byts = b'\xab\xac'
expected_row = {
'float': 0.33,
'numeric': Decimal('10'),
- 'bytes': base64.b64encode(b'\xab\xac'),
+ 'bytes': base64.b64encode(byts) if native else byts,
Review comment:
Bytes treatment should be called out in IO doc, we do mention it:
https://github.com/apache/beam/blob/8bc2880cca40c00a96623b3ce96ea0b856af76c9/sdks/python/apache_beam/io/gcp/bigquery.py#L227.
b64encoding may be unnecessary, and less efficient. I think the reason
native IO encodes bytes with b64 is because that was the behavior in Java SDK.
We can argue that's not necessary. However I am concerned about the consistency
of the UX here. Different UX for two transforms means the transforms will not
be interchangeable, and users might overlook this. This might also cause
friction in cross-language pipelines.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services