rszper commented on code in PR #29910:
URL: https://github.com/apache/beam/pull/29910#discussion_r1441038338
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -53,7 +50,10 @@ modules for machine learning tasks.
## Transforms {#transforms}
You can use `MLTransform` to perform the following data processing transforms.
Review Comment:
```suggestion
You can use `MLTransform` to perform various data processing transforms.
```
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -53,7 +50,10 @@ modules for machine learning tasks.
## Transforms {#transforms}
You can use `MLTransform` to perform the following data processing transforms.
-For information about the transforms, see
+
+### Data Processing Transforms using TFT
Review Comment:
```suggestion
### Data processing transforms that use TFT
```
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
| TFIDF | See
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
in the TensorFlow documentation. |:
{{< /table >}}
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists
containing
-each column's data.
+### Generate Text Embeddings
Review Comment:
```suggestion
### Text embedding transforms
```
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
| TFIDF | See
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
in the TensorFlow documentation. |:
{{< /table >}}
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists
containing
-each column's data.
+### Generate Text Embeddings
+
+{{< table >}}
+| Transform name | Description |
+| ------- | ---------------|
+| SentenceTransformerEmbeddings | Uses
[sentence-transformer](https://huggingface.co/sentence-transformers) models to
generate text embeddings. sentence-transformers models hosted on HuggingFace
hub are supported.
Review Comment:
```suggestion
| SentenceTransformerEmbeddings | Uses the Hugging Face
[`sentence-transformers`](https://huggingface.co/sentence-transformers) models
to generate text embeddings.
```
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -53,7 +50,10 @@ modules for machine learning tasks.
## Transforms {#transforms}
You can use `MLTransform` to perform the following data processing transforms.
-For information about the transforms, see
+
+### Data Processing Transforms using TFT
+
+For information about the tft based transforms, see
Review Comment:
```suggestion
The following set of transforms available in the `MLTransform` class come
from
the TensorFlow Transforms (TFT) library. TFT offers specialized processing
modules for machine learning tasks. For information about these transforms,
see
```
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -43,6 +39,7 @@ modules for machine learning tasks.
- Count the occurrences of words in all the documents to calculate
[TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf)
weights.
+ - Generate [embeddings](https://en.wikipedia.org/wiki/Embedding) on text
data using LLMs.
Review Comment:
```suggestion
- Generate [embeddings](https://en.wikipedia.org/wiki/Embedding) on
text data using large language models (LLMs).
```
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
| TFIDF | See
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
in the TensorFlow documentation. |:
{{< /table >}}
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists
containing
-each column's data.
+### Generate Text Embeddings
+
Review Comment:
Add a sentence after the heading but before the table:
You can use `MLTranfrorm` to generate embeddings that you can use to push
data into vector databases or to run inference.
##########
website/www/site/content/en/documentation/ml/preprocess-data.md:
##########
@@ -70,9 +70,13 @@ TensorFlow documentation.
| TFIDF | See
[`tft.tfidf`](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/tfidf)
in the TensorFlow documentation. |:
{{< /table >}}
-Apply the transforms on either single or multiple columns passed as a
-`dict` on structured data. Keys are column names and values are lists
containing
-each column's data.
+### Generate Text Embeddings
+
+{{< table >}}
+| Transform name | Description |
+| ------- | ---------------|
+| SentenceTransformerEmbeddings | Uses
[sentence-transformer](https://huggingface.co/sentence-transformers) models to
generate text embeddings. sentence-transformers models hosted on HuggingFace
hub are supported.
+| VertexAITextEmbeddings | Uses [Vertex AI](
https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings)
text embedding models to generate text embeddings.
Review Comment:
```suggestion
| VertexAITextEmbeddings | Uses models from the [the Vertex AI
text-embeddings
API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings)
to generate text embeddings.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]