rszper commented on code in PR #29910:
URL: https://github.com/apache/beam/pull/29910#discussion_r1441038338


##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -53,7 +50,10 @@ modules for machine learning tasks.
 ## Transforms {#transforms}
 
 You can use `MLTransform` to perform the following data processing transforms.

Review Comment:
   ```suggestion
   You can use `MLTransform` to perform various data processing transforms.
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -53,7 +50,10 @@ modules for machine learning tasks.
 ## Transforms {#transforms}
 
 You can use `MLTransform` to perform the following data processing transforms.
-For information about the transforms, see
+
+### Data Processing Transforms using TFT

Review Comment:
   ```suggestion
   ### Data processing transforms that use TFT
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
 | TFIDF | See 
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
 in the TensorFlow documentation. |:
 {{< /table >}}
 
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists 
containing
-each column's data.
+### Generate Text Embeddings

Review Comment:
   ```suggestion
   ### Text embedding transforms
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
 | TFIDF | See 
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
 in the TensorFlow documentation. |:
 {{< /table >}}
 
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists 
containing
-each column's data.
+### Generate Text Embeddings
+
+{{< table >}}
+| Transform name | Description |
+| ------- | ---------------|
+| SentenceTransformerEmbeddings | Uses 
[sentence-transformer](https://huggingface.co/sentence-transformers) models to 
generate text embeddings. sentence-transformers models hosted on HuggingFace 
hub are supported.

Review Comment:
   ```suggestion
   | SentenceTransformerEmbeddings | Uses the Hugging Face 
[`sentence-transformers`](https://huggingface.co/sentence-transformers) models 
to generate text embeddings.
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -53,7 +50,10 @@ modules for machine learning tasks.
 ## Transforms {#transforms}
 
 You can use `MLTransform` to perform the following data processing transforms.
-For information about the transforms, see
+
+### Data Processing Transforms using TFT
+
+For information about the tft based transforms, see

Review Comment:
   ```suggestion
   The following set of transforms available in the `MLTransform` class come 
from
   the TensorFlow Transforms (TFT) library. TFT offers specialized processing
   modules for machine learning tasks. For information about these transforms, 
see
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -43,6 +39,7 @@ modules for machine learning tasks.
     -   Count the occurrences of words in all the documents to calculate
         [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
         weights.
+    -  Generate [embeddings](https://en.wikipedia.org/wiki/Embedding) on text 
data using LLMs.

Review Comment:
   ```suggestion
       -  Generate [embeddings](https://en.wikipedia.org/wiki/Embedding) on 
text data using large language models (LLMs).
   ```



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
 | TFIDF | See 
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
 in the TensorFlow documentation. |:
 {{< /table >}}
 
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists 
containing
-each column's data.
+### Generate Text Embeddings
+

Review Comment:
   Add a sentence after the heading but before the table:
   
   You can use `MLTranfrorm` to generate embeddings that you can use to push 
data into vector databases or to run inference.



##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
 | TFIDF | See 
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
 in the TensorFlow documentation. |:
 {{< /table >}}
 
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists 
containing
-each column's data.
+### Generate Text Embeddings
+
+{{< table >}}
+| Transform name | Description |
+| ------- | ---------------|
+| SentenceTransformerEmbeddings | Uses 
[sentence-transformer](https://huggingface.co/sentence-transformers) models to 
generate text embeddings. sentence-transformers models hosted on HuggingFace 
hub are supported.
+| VertexAITextEmbeddings | Uses [Vertex AI]( 
https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings)
 text embedding models to generate text embeddings.

Review Comment:
   ```suggestion
   | VertexAITextEmbeddings | Uses models from the [the Vertex AI 
text-embeddings 
API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings)
 to generate text embeddings.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to