tvalentyn commented on a change in pull request #15900:
URL: https://github.com/apache/beam/pull/15900#discussion_r747018962
##########
File path: sdks/python/apache_beam/examples/fastavro_it_test.py
##########
@@ -135,29 +146,9 @@ def batch_indices(start):
assert result.state == PipelineState.DONE
with TestPipeline(is_integration_test=True) as fastavro_read_pipeline:
-
- fastavro_records = \
- fastavro_read_pipeline \
- | 'create-fastavro' >> Create(['%s*' % fastavro_output]) \
- | 'read-fastavro' >> ReadAllFromAvro() \
- | Map(lambda rec: (rec['number'], rec))
-
- def check(elem):
- v = elem[1]
-
- def assertEqual(l, r):
- if l != r:
- raise BeamAssertException('Assertion failed: %s == %s' % (l, r))
-
- assertEqual(sorted(v.keys()), ['fastavro'])
- fastavro_values = v['fastavro']
- assertEqual(len(fastavro_values), 1)
-
- # pylint: disable=expression-not-assigned
- {
- 'fastavro': fastavro_records
- } \
- | CoGroupByKey() \
+ fastavro_read_pipeline \
+ | 'create-fastavro' >> Create(['%s*' % fastavro_output]) \
+ | 'read-fastavro' >> ReadAllFromAvro() \
Review comment:
Can we also compare the values for the keys, to make sure that no values
were not lost during write-read operation?
I think it could be accomplished by running co-GBK of a pcollection coming
form ` | 'read-fastavro' >> ReadAllFromAvro() \`, and pcollection of
generated data. Then, we can extract the set of elements tagged with first
pcollection, and the second pcollection, and verify that these sets are the
same for all elements in GBK output.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]