Re: [PR] Add ApplyBucketsWithInterpolation TFTransform [beam]

via GitHub Wed, 22 May 2024 08:11:50 -0700


tvalentyn commented on code in PR #31291:
URL: https://github.com/apache/beam/pull/31291#discussion_r1610177110



##########
sdks/python/apache_beam/ml/transforms/tft.py:
##########
@@ -363,6 +363,42 @@ def apply_transform(
     return output
 
 
+@register_input_dtype(float)
+class ApplyBucketsWithInterpolation(TFTOperation):
+  def __init__(
+      self,
+      columns: List[str],
+      bucket_boundaries: Iterable[Union[int, float]],
+      name: Optional[str] = None):
+    """ Interpolates values within the provided buckest and then normalizes to
+    [0, 1].
+    
+    Input values are bucketized based on the provided boundaries such that the
+    input is mapped to a positive index i for which bucket_boundaries[i-1] <=
+    element < bucket_boundaries[i], if it exists. The values are then
+    normalized to the range [0,1] within the bucket, with NaN values being
+    mapped to 0.5.

Review Comment:
   should we link to TFT docs for more info as in some other ML Ops?



##########
sdks/python/apache_beam/ml/transforms/tft.py:
##########
@@ -363,6 +363,42 @@ def apply_transform(
     return output
 
 
+@register_input_dtype(float)
+class ApplyBucketsWithInterpolation(TFTOperation):
+  def __init__(
+      self,
+      columns: List[str],
+      bucket_boundaries: Iterable[Union[int, float]],
+      name: Optional[str] = None):
+    """ Interpolates values within the provided buckest and then normalizes to

Review Comment:
   ```suggestion
       """Interpolates values within the provided buckets and then normalizes to
   ```



##########
sdks/python/apache_beam/ml/transforms/tft.py:
##########
@@ -363,6 +363,42 @@ def apply_transform(
     return output
 
 
+@register_input_dtype(float)
+class ApplyBucketsWithInterpolation(TFTOperation):
+  def __init__(
+      self,
+      columns: List[str],
+      bucket_boundaries: Iterable[Union[int, float]],
+      name: Optional[str] = None):
+    """ Interpolates values within the provided buckest and then normalizes to
+    [0, 1].
+    
+    Input values are bucketized based on the provided boundaries such that the
+    input is mapped to a positive index i for which bucket_boundaries[i-1] <=

Review Comment:
   do we need some escaping for code-snippets with backticks or something 
similar to make it look nicer in pydoc?



##########
sdks/python/apache_beam/ml/transforms/tft.py:
##########
@@ -363,6 +363,42 @@ def apply_transform(
     return output
 
 
+@register_input_dtype(float)
+class ApplyBucketsWithInterpolation(TFTOperation):
+  def __init__(
+      self,
+      columns: List[str],
+      bucket_boundaries: Iterable[Union[int, float]],
+      name: Optional[str] = None):
+    """ Interpolates values within the provided buckest and then normalizes to
+    [0, 1].
+    
+    Input values are bucketized based on the provided boundaries such that the
+    input is mapped to a positive index i for which bucket_boundaries[i-1] <=
+    element < bucket_boundaries[i], if it exists. The values are then
+    normalized to the range [0,1] within the bucket, with NaN values being
+    mapped to 0.5.
+
+    Args:
+      columns: A list of column names to apply the transformation on.
+      bucket_boundaries: A rank 2 Tensor or list representing the bucket

Review Comment:
   Is the typehint set correctly for bucket_boundaries? Is Rank 2 Tensor a 2d 
matrix?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add ApplyBucketsWithInterpolation TFTransform [beam]

Reply via email to