damccorm commented on code in PR #30088:
URL: https://github.com/apache/beam/pull/30088#discussion_r1465033942
##########
sdks/python/apache_beam/transforms/enrichment_handlers/bigtable.py:
##########
@@ -122,9 +126,10 @@ def __call__(self, request: beam.Row, *args, **kwargs):
if row:
for cf_id, cf_v in row.cells.items():
response_dict[cf_id] = {}
- for k, v in cf_v.items():
- response_dict[cf_id][k.decode(self._encoding)] = \
- v[0].value.decode(self._encoding)
+ for col_id, col_v in cf_v.items():
+ response_dict[cf_id][col_id.decode(self._encoding)] = [
+ (v.value.decode(self._encoding), v.timestamp) for v in col_v
+ ]
Review Comment:
So basically, now the enrichment response would be:
`beam.Row(<original values>, [({<bigTableResponse1>},
most_recent_timestamp), ({<bigTableResponse2>}, most_recent_timestamp)]`
instead of `beam.Row(<original values>, <joined values>)`?
My take is that this is probably a less desirable behavior for most simple
use cases (which is where our emphasis is), since this new behavior will always
require some filtering. I'd probably vote that if we do this, we gate it as an
option behind a keyword only arg. Thoughts?
##########
sdks/python/apache_beam/transforms/enrichment_handlers/bigtable.py:
##########
@@ -122,9 +126,10 @@ def __call__(self, request: beam.Row, *args, **kwargs):
if row:
for cf_id, cf_v in row.cells.items():
response_dict[cf_id] = {}
- for k, v in cf_v.items():
- response_dict[cf_id][k.decode(self._encoding)] = \
- v[0].value.decode(self._encoding)
+ for col_id, col_v in cf_v.items():
+ response_dict[cf_id][col_id.decode(self._encoding)] = [
+ (v.value.decode(self._encoding), v.timestamp) for v in col_v
+ ]
Review Comment:
Also, something I just thought of. Right now, the import here is `from
apache_beam.transforms.enrichment_handlers.bigtable import EnrichWithBigTable`
Thoughts on `from apache_beam.transforms.enrichment_handlers.bigtable import
BigTableEnrichmentHandler` to maintain a consistent feel with our other turnkey
transforms?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]