tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro URL: https://github.com/apache/beam/pull/11086#discussion_r398232108
########## File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py ########## @@ -236,11 +251,12 @@ def create_table(cls, table_name): cls.bigquery_client.insert_rows( cls.project, cls.dataset_id, table_name, table_data) - def get_expected_data(self): + def get_expected_data(self, native=True): + byts = b'\xab\xac' expected_row = { 'float': 0.33, 'numeric': Decimal('10'), - 'bytes': base64.b64encode(b'\xab\xac'), + 'bytes': base64.b64encode(byts) if native else byts, Review comment: Bytes treatment should be called out in IO doc, we do mention it: https://github.com/apache/beam/blob/8bc2880cca40c00a96623b3ce96ea0b856af76c9/sdks/python/apache_beam/io/gcp/bigquery.py#L227. b64encoding may be unnecessary, and less efficient. I think the reason native IO encodes bytes with b64 is because that was the behavior in Java SDK. We can argue that's not necessary. However I am concerned about the consistency of the UX here. Different UX for two transforms means the transforms will not be interchangeable, and users might overlook this. This might also cause friction in cross-language pipelines. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services