damccorm commented on code in PR #25393:
URL: https://github.com/apache/beam/pull/25393#discussion_r1101933389
##########
sdks/python/apache_beam/ml/inference/utils.py:
##########
@@ -94,7 +94,7 @@ class _GetLatestFileByTimeStamp(beam.DoFn):
started. If no such files are found, it returns a default file as fallback.
"""
TIME_STATE = CombiningValueStateSpec(
- 'count', combine_fn=partial(max, default=_START_TIME_STAMP))
+ 'max', combine_fn=partial(max, default=_START_TIME_STAMP))
Review Comment:
Seems like the linter doesn't like this change I suggested (count -> max) -
if that's the case, feel free to switch back
##########
sdks/python/apache_beam/ml/inference/utils.py:
##########
@@ -125,17 +125,21 @@ def __init__(
"""
Watches a directory for updates to files matching a given file pattern.
- **Note**: Start timestamp will be defaulted to timestamp when pipeline
- was run. All the files matching file_pattern, that are uploaded before
- the pipeline started will be discarded.
-
Args:
- file_pattern: A glob pattern used to watch a directory for model
- updates.
+ file_pattern: The file path to read from as a local file path or a
+ GCS ``gs://`` path. The path can contain glob characters
+ (``*``, ``?``, and ``[...]`` sets).
interval: Interval at which to check for files matching file_pattern
in seconds.
stop_timestamp: Timestamp after which no more files will be checked.
+ Constraints:
+ 1. If the file is read and then there is an update to that file, this
+ transform will ignore that update. Always update a file with unique
+ name.
Review Comment:
This doesn't quite convey that the following won't work:
```
- Create model with name A (model A is now used)
- Create model with name B (model B is now used)
- Delete file with name A
- Create new model with name A (model A is still not used because of
stateful DoFn)
```
Could we update to something like: `Any previously used filenames cannot be
reused. If a file is added or updated to a previously used filename, this
transform will ignore that update. To trigger a model update, always upload a
file with unique name.`
(tried to keep as much of your wording as possible, feel free to reword if
you don't like my phrasing).
##########
sdks/python/apache_beam/examples/inference/pytorch_image_classification_with_side_inputs.py:
##########
@@ -26,8 +26,11 @@
This pipeline follows the pattern from
https://beam.apache.org/documentation/patterns/side-inputs/
-This pipeline expects a PubSub topic as source, which emits an image
-path(UTF-8 encoded) that is accessible by the pipeline.
+To use the PubSub reading from a topic in the pipeline as source, you can
+publish a path to the model(resnet152 used in the pipeline from
+torchvision.models.resnet152) to the PubSub topic. Then pass that
+topic via command line arg --topic. The published path(str) should be
+UTF-8 encoded.
Review Comment:
Looks like docs precommit doesn't like some of your indentation here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]