tvalentyn commented on a change in pull request #15900:
URL: https://github.com/apache/beam/pull/15900#discussion_r745051997
##########
File path: sdks/python/apache_beam/io/avroio.py
##########
@@ -169,8 +159,8 @@ class ReadAllFromAvro(PTransform):
def __init__(
self,
min_bundle_size=0,
- desired_bundle_size=DEFAULT_DESIRED_BUNDLE_SIZE,
use_fastavro=True,
Review comment:
let's keep the order of the arguments
##########
File path: sdks/python/apache_beam/io/avroio.py
##########
@@ -297,130 +217,15 @@ def advance_file_past_next_sync_marker(f, sync_marker):
def _create_avro_source(
- file_pattern=None, min_bundle_size=0, validate=False, use_fastavro=True):
+ file_pattern=None, min_bundle_size=0, validate=False, use_fasvro=True):
Review comment:
typo in the argument name
##########
File path: sdks/python/apache_beam/examples/fastavro_it_test.py
##########
@@ -151,13 +141,7 @@ def batch_indices(start):
fastavro_records = \
fastavro_read_pipeline \
| 'create-fastavro' >> Create(['%s*' % fastavro_output]) \
- | 'read-fastavro' >> ReadAllFromAvro(use_fastavro=True) \
- | Map(lambda rec: (rec['number'], rec))
-
- avro_records = \
- fastavro_read_pipeline \
- | 'create-avro' >> Create(['%s*' % avro_output]) \
- | 'read-avro' >> ReadAllFromAvro(use_fastavro=False) \
+ | 'read-fastavro' >> ReadAllFromAvro() \
Review comment:
The mechanics of this test was to run a pipeline with Avro and with
FastAvro, and then check that the output was the same. Now we don't have Avro
implementation, so such test scenario wouldn't work. We could instead verify
that the checksum of the values that have bean read matches some pre-computed
value.
##########
File path: sdks/python/apache_beam/examples/fastavro_it_test.py
##########
@@ -124,22 +123,13 @@ def batch_indices(start):
| 'create-records' >> Map(record)
fastavro_output = '/'.join([self.output, 'fastavro'])
- avro_output = '/'.join([self.output, 'avro'])
+ # avro_output = '/'.join([self.output, 'avro'])
Review comment:
remove the commented out code
##########
File path: sdks/python/apache_beam/io/avroio.py
##########
@@ -297,130 +217,15 @@ def advance_file_past_next_sync_marker(f, sync_marker):
def _create_avro_source(
- file_pattern=None, min_bundle_size=0, validate=False, use_fastavro=True):
+ file_pattern=None, min_bundle_size=0, validate=False, use_fasvro=True):
Review comment:
Actually, this is an internal function (not a public API), we don't need
to add use_fastavro here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]