[GitHub] [beam] robertwb commented on a diff in pull request #27939: Update how we specify artifact modes

via GitHub Wed, 09 Aug 2023 13:15:39 -0700


robertwb commented on code in PR #27939:
URL: https://github.com/apache/beam/pull/27939#discussion_r1289144166



##########
sdks/python/apache_beam/ml/transforms/base.py:
##########
@@ -115,8 +115,8 @@ class 
MLTransform(beam.PTransform[beam.PCollection[ExampleT],
   def __init__(
       self,
       *,
-      artifact_location: str,
-      artifact_mode: str = ArtifactMode.PRODUCE,
+      write_artifact_location: str = '',

Review Comment:
   Or would `Optional[str] = None` be preferable? 



##########
sdks/python/apache_beam/ml/transforms/base.py:
##########
@@ -143,23 +143,43 @@ def __init__(
         retrieved from this location. Note that when consuming artifacts,
         it is not necessary to pass the transforms since they are inherently
         stored within the artifacts themselves. The value assigned to
-        `artifact_location` should be a valid storage path where the artifacts
-        can be written to or read from.
+        `write_artifact_location` should be a valid storage path where the
+        artifacts can be written to. Only one of write_artifact_location and
+        read_artifact_location should be specified.
+      read_artifact_location: A storage location to read artifacts resulting
+        froma previous MLTransform. These artifacts include transformations
+        applied to the dataset and generated values like min, max from
+        ScaleTo01, and mean, var from ScaleToZScore. Note that when consuming
+        artifacts, it is not necessary to pass the transforms since they are
+        inherently stored within the artifacts themselves. The value assigned
+        to `read_artifact_location` should be a valid storage path where the
+        artifacts can be read from. Only one of write_artifact_location and
+        read_artifact_location should be specified.
       transforms: A list of transforms to apply to the data. All the transforms
         are applied in the order they are specified. The input of the
         i-th transform is the output of the (i-1)-th transform. Multi-input
         transforms are not supported yet.
-      artifact_mode: Whether to produce or consume artifacts. If set to
-        'consume', MLTransform will assume that the artifacts are already
-        computed and stored in the artifact_location. Pass the same artifact
-        location that was passed during produce phase to ensure that the
-        right artifacts are read. If set to 'produce', MLTransform
-        will compute the artifacts and store them in the artifact_location.
-        The artifacts will be read from this location during the consume phase.
     """
     if transforms:
       _ = [self._validate_transform(transform) for transform in transforms]
 
+    if len(read_artifact_location) > 0 and len(write_artifact_location) > 0:

Review Comment:
   Regardless of the comment above, I think `if read_artifact_location and 
write_artifact_location:` reads clearer. Similarly below one can write `if not 
read_artifact_location and not write_artifact_location` (or, equivalently `if 
not (read_artifact_location or write_artifact_location)`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] robertwb commented on a diff in pull request #27939: Update how we specify artifact modes

Reply via email to