Re: [PR] Update MLTransform docs [beam]

via GitHub Thu, 04 Jan 2024 06:29:15 -0800


rszper commented on code in PR #29910:
URL: https://github.com/apache/beam/pull/29910#discussion_r1441823009



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -45,18 +39,23 @@ modules for machine learning tasks.
     -   Count the occurrences of words in all the documents to calculate
         [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
         weights.
+    -  Generate [embeddings](https://en.wikipedia.org/wiki/Embedding) on text 
data using large language models (LLMs).

Review Comment:
   We might want to make this the first list item, depending on how much we 
want to emphasize embeddings. We could also move the Text embedding transforms 
section before the Data processing transforms that use TFT section.



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -45,18 +39,23 @@ modules for machine learning tasks.
     -   Count the occurrences of words in all the documents to calculate
         [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
         weights.
+    -  Generate [embeddings](https://en.wikipedia.org/wiki/Embedding) on text 
data using large language models (LLMs).
 
 ## Support and limitations {#support}
 
--   Available in the Apache Beam Python SDK versions 2.50.0 and later.
--   Supports Python 3.8 and 3.9.
+-   Available in the Apache Beam Python SDK versions 2.53.0 and later.
+-   Supports Python 3.8, 3.9 and 3.10.

Review Comment:
   ```suggestion
   -   Supports Python 3.8, 3.9, and 3.10.
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -45,18 +39,23 @@ modules for machine learning tasks.
     -   Count the occurrences of words in all the documents to calculate
         [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
         weights.
+    -  Generate [embeddings](https://en.wikipedia.org/wiki/Embedding) on text 
data using large language models (LLMs).
 
 ## Support and limitations {#support}
 
--   Available in the Apache Beam Python SDK versions 2.50.0 and later.
--   Supports Python 3.8 and 3.9.
+-   Available in the Apache Beam Python SDK versions 2.53.0 and later.
+-   Supports Python 3.8, 3.9 and 3.10.
 -   Only available for pipelines that use [default 
windows](/documentation/programming-guide/#single-global-window).
--   Only supports one-to-one transform mapping on a single element.
 
 ## Transforms {#transforms}
 
-You can use `MLTransform` to perform the following data processing transforms.
-For information about the transforms, see
+You can use `MLTransform` to perform various data processing transforms.

Review Comment:
   ```suggestion
   You can use `MLTransform` to generate text embeddings and to perform various 
data processing transforms.
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -197,32 +198,3 @@ Replace the following values:
 For more examples, see
 [MLTransform for data 
processing](/documentation/transforms/python/elementwise/mltransform)
 in the [transform catalog](/documentation/transforms/python/overview/).
-
-### ScaleTo01 example {#scaleto01}
-
-This example demonstrates how to use `MLTransform` to normalize your data
-between 0 and 1 by using the minimum and maximum values from your entire
-dataset. `MLTransform` uses the `ScaleTo01` transformation.
-
-Use the following snippet to apply `ScaleTo01` on column `x` of the input
-data.
-
-```
-data_pcoll | 
MLTransform(write_artifact_location=<LOCATION>).with_transform(ScaleTo01(columns=['x']))
-```
-
-The `ScaleTo01` transformation produces two artifacts: the `min` and the `max`
-of the entire dataset. For more information, see the
-[Artifacts](#artifacts) section on this page.
-
-## Metrics {#metrics}

Review Comment:
   Just want to verify whether removing the Metrics section is intentional.



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -73,18 +72,20 @@ TensorFlow documentation.
 | TFIDF | See 
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
 in the TensorFlow documentation. |:
 {{< /table >}}
 
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists 
containing
-each column's data.
+### Text embedding transforms
+
+You can use `MLTranfrorm` to generate embeddings that you can use to push data 
into vector databases or to run inference.

Review Comment:
   ```suggestion
   You can use `MLTranform` to generate embeddings that you can use to push 
data into vector databases or to run inference.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Update MLTransform docs [beam]

Reply via email to