[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

GitBox Wed, 25 Mar 2020 16:30:16 -0700

tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom 
BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#discussion_r398232108


 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py
 ##########
 @@ -236,11 +251,12 @@ def create_table(cls, table_name):
     cls.bigquery_client.insert_rows(
         cls.project, cls.dataset_id, table_name, table_data)
 
-  def get_expected_data(self):
+  def get_expected_data(self, native=True):
+    byts = b'\xab\xac'
     expected_row = {
         'float': 0.33,
         'numeric': Decimal('10'),
-        'bytes': base64.b64encode(b'\xab\xac'),
+        'bytes': base64.b64encode(byts) if native else byts,
 
 Review comment:
   Bytes treatment should be called out in IO doc, we do mention it: 
https://github.com/apache/beam/blob/8bc2880cca40c00a96623b3ce96ea0b856af76c9/sdks/python/apache_beam/io/gcp/bigquery.py#L227.
 
   
   b64encoding may be unnecessary, and less efficient. I think the reason 
native IO encodes bytes with b64 is because that was the behavior in Java SDK. 
We can argue that's not necessary. However I am concerned about the consistency 
of the UX here. Different UX for two transforms means the transforms will not 
be interchangeable, and users might overlook this.  This might also cause 
friction in cross-language pipelines.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

Reply via email to