[beam] branch master updated: restructure ml overview website page (#25607)

damccorm Fri, 24 Feb 2023 07:13:14 -0800

This is an automated email from the ASF dual-hosted git repository.

damccorm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/master by this push:
     new bddfd86afd2 restructure ml overview website page (#25607)
bddfd86afd2 is described below

commit bddfd86afd26e12fab01ac5fdacc2b387fc598e8
Author: Juta Staes <[email protected]>
AuthorDate: Fri Feb 24 16:12:49 2023 +0100

    restructure ml overview website page (#25607)
    
    * restructure ml overview website page
    
    * small edits ML website
---
 .../www/site/content/en/documentation/ml/overview.md | 20 +++++++++++++++-----
 .../partials/section-menu/en/documentation.html      |  6 +++---
 website/www/site/static/images/ml-workflows.svg      |  2 +-
 3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/website/www/site/content/en/documentation/ml/overview.md 
b/website/www/site/content/en/documentation/ml/overview.md
index dabe7e9629f..feec2c4e807 100644
--- a/website/www/site/content/en/documentation/ml/overview.md
+++ b/website/www/site/content/en/documentation/ml/overview.md
@@ -36,12 +36,12 @@ Let’s take a look at the different building blocks that we 
need to create an e
 2. **Data validation**: After you receieve your data, check the quality of 
your data. For example, you might want to detect outliers and calculate 
standard deviations and class distributions.
 3. **Data preprocessing**: After you validate your data, transform the data so 
that it is ready to use to train your model.
 4. Model training: When your data is ready, you can start training your AI/ML 
model. This step is typically repeated multiple times, depending on the quality 
of your trained model.
-5. Model validation: Before you deploy your new model, validate its 
performance and accuracy.
+5. **Model validation**: Before you deploy your new model, validate its 
performance and accuracy.
 6. **Model deployment**: Deploy your model, using it to run inference on new 
or existing data.
 
 To keep your model up to date and performing well as your data grows and 
evolves, run these steps multiple times. In addition, you can apply MLOps to 
your project to automate the AI/ML workflows throughout the model and data 
lifecycle. Use orchestrators to automate this flow and to handle the transition 
between the different building blocks in your project.
 
-You can use Apache Beam for data validation, data preprocessing, and model 
deployment/inference. The next section examines these building blocks in more 
detail and explores how they can be orchestrated.
+You can use Apache Beam for data validation, data preprocessing, model 
validation, and model deployment/inference. The next section examines these 
building blocks in more detail and explores how they can be orchestrated.
 
 ## Data processing
 
@@ -62,10 +62,12 @@ Beam provides different ways to implement inference as part 
of your pipeline. Yo
 
 The recommended way to implement inference is by using the [RunInference 
API](/documentation/sdks/python-machine-learning/). RunInference takes 
advantage of existing Apache Beam concepts, such as the `BatchElements` 
transform and the `Shared` class, to enable you to use models in your pipelines 
to create transforms optimized for machine learning inferences. The ability to 
create arbitrarily complex workflow graphs also allows you to build multi-model 
pipelines.
 
-You can integrate your model in your pipeline by using the corresponding model 
handlers. A `ModelHandler` is an object that wraps the underlying model and 
allows you to configure its parameters. Model handlers are available for 
PyTorch, scikit-learn, and TensorFlow. Examples of how to use RunInference for 
PyTorch, scikit-learn, and TensorFlow are shown in this 
[notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_sklearn.ipynb).
+You can integrate your model in your pipeline by using the corresponding model 
handlers. A `ModelHandler` is an object that wraps the underlying model and 
allows you to configure its parameters. Model handlers are available for 
PyTorch, scikit-learn, and TensorFlow. Examples of how to use RunInference for 
PyTorch, scikit-learn, and TensorFlow are shown in the [RunInference 
notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_pytorch_tensorflow_skl
 [...]
 
 Because they can process multiple computations simultaneously, GPUs are 
optimized for training artificial intelligence and deep learning models. 
RunInference also allows you to use GPUs for significant inference speedup. An 
example of how to use RunInference with GPUs is demonstrated on the 
[RunInference metrics](/documentation/ml/runinference-metrics) page.
 
+Another usecase of running machine learning models is to run them on hardware 
devices. [Nvidia TensorRT](https://developer.nvidia.com/tensorrt) is a machine 
learning framework used to run inference on Nvidia hardware. See [TensorRT 
Inference](/documentation/ml/tensorrt-runinference) for an example of a 
pipeline that uses TensorRT and Beam with the RunInference transform and a 
BERT-based text classification model.
+
 ### Custom Inference
 
 The RunInference API doesn't currently support making remote inference calls 
using, for example, the Natural Language API or the Cloud Vision API. 
Therefore, in order to use these remote APIs with Apache Beam, you need to 
write custom inference calls. The [Remote inference in Apache Beam 
notebook](https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/custom_remote_inference.ipynb)
 shows how to implement a custom remote inference call using `beam.DoFn`. When 
you implement  [...]
@@ -78,6 +80,12 @@ The RunInference API doesn't currently support making remote 
inference calls usi
 
 * Consider monitoring and measuring the performance of a pipeline when 
deploying, because monitoring can provide insight into the status and health of 
the application.
 
+## Model validation
+
+Model validation allows you to benchmark your model’s performance against a 
previously unseen dataset. You can extract chosen metrics, create 
visualizations, log metadata, and compare the performance of different models 
with the end goal of validating whether your model is ready to deploy. Beam 
provides support for running model evaluation on a TensorFlow model directly 
inside your pipeline.
+
+Further reading:
+* [ML model evaluation](/documentation/ml/model-evaluation): Illustrates how 
to integrate model evaluation as part of your pipeline by using [TensorFlow 
Model Analysis (TFMA)](https://www.tensorflow.org/tfx/guide/tfma).
 
 ## Orchestrators
 
@@ -85,13 +93,15 @@ In order to automate and track the AI/ML workflows 
throughout your project, you
 
 When you use Apache Beam as one of the building blocks in your project, these 
orchestrators are able to launch your Apache Beam job and to keep track of the 
input and output of your pipeline. These tasks are essential when moving your 
AI/ML solution into production, because they allow you to handle your model and 
data over time and improve the quality and reproducibility of results.
 
+Further reading:
+* [ML Workflow Orchestration](/documentation/ml/orchestration): Illustrates 
how to orchestrate ML workflows consisting of multiple steps by using Kubeflow 
Pipelines and Tensorflow Extended.
+
 ## Examples
 
 You can find examples of end-to-end AI/ML pipelines for several use cases:
-* [ML Workflow Orchestration](/documentation/ml/orchestration): Illustrates 
how to orchestrate ML workflows consisting of multiple steps by using Kubeflow 
Pipelines and Tensorflow Extended.
+
 * [Multi model pipelines in Beam](/documentation/ml/multi-model-pipelines): 
Explains how multi-model pipelines work and gives an overview of what you need 
to know to build one using the RunInference API.
 * [Online Clustering in Beam](/documentation/ml/online-clustering): 
Demonstrates how to set up a real-time clustering pipeline that can read text 
from Pub/Sub, convert the text into an embedding using a transformer-based 
language model with the RunInference API, and cluster the text using BIRCH with 
stateful processing.
 * [Anomaly Detection in Beam](/documentation/ml/anomaly-detection): 
Demonstrates how to set up an anomaly detection pipeline that reads text from 
Pub/Sub in real time and then detects anomalies using a trained HDBSCAN 
clustering model with the RunInference API.
 * [Large Language Model Inference in 
Beam](/documentation/ml/large-language-modeling): Demonstrates a pipeline that 
uses RunInference to perform translation with the T5 language model which 
contains 11 billion parameters.
 * [Per Entity Training in Beam](/documentation/ml/per-entity-training): 
Demonstrates a pipeline that trains a Decision Tree Classifier per education 
level for predicting if the salary of a person is >= 50k.
-* [TensorRT Inference](/documentation/ml/tensorrt-runinference): Demonstrates 
a pipeline that uses TensorRT with the RunInference transform and a BERT-based 
text classification model.
diff --git 
a/website/www/site/layouts/partials/section-menu/en/documentation.html 
b/website/www/site/layouts/partials/section-menu/en/documentation.html
index 2722baf9bb7..61d7aa9fe35 100755
--- a/website/www/site/layouts/partials/section-menu/en/documentation.html
+++ b/website/www/site/layouts/partials/section-menu/en/documentation.html
@@ -217,16 +217,16 @@
 
   <ul class="section-nav-list">
     <li><a href="/documentation/ml/overview/">Overview</a></li>
-    <li><a href="/documentation/ml/orchestration/">Workflow 
Orchestration</a></li>
     <li><a href="/documentation/ml/data-processing/">Data processing</a></li>
+    <li><a href="/documentation/ml/runinference-metrics/">RunInference 
Metrics</a></li>
+    <li><a href="/documentation/ml/tensorrt-runinference">TensorRT 
Inference</a></li>
     <li><a href="/documentation/ml/model-evaluation/">Model evaluation</a></li>
+    <li><a href="/documentation/ml/orchestration/">Workflow 
Orchestration</a></li>
     <li><a href="/documentation/ml/multi-model-pipelines/">Multi-model 
pipelines</a></li>
     <li><a href="/documentation/ml/online-clustering/">Online 
Clustering</a></li>
-    <li><a href="/documentation/ml/runinference-metrics/">RunInference 
Metrics</a></li>
     <li><a href="/documentation/ml/anomaly-detection/">Anomaly 
Detection</a></li>
     <li><a href="/documentation/ml/large-language-modeling">Large Language 
Model Inference in Beam</a></li>
     <li><a href="/documentation/ml/per-entity-training">Per Entity Training in 
Beam</a></li>
-    <li><a href="/documentation/ml/tensorrt-runinference">TensorRT 
Inference</a></li>
   </ul>
 </li>
 <li class="section-nav-item--collapsible">
diff --git a/website/www/site/static/images/ml-workflows.svg 
b/website/www/site/static/images/ml-workflows.svg
index 2a9cb3c1f27..90130a40672 100755
--- a/website/www/site/static/images/ml-workflows.svg
+++ b/website/www/site/static/images/ml-workflows.svg
@@ -14,4 +14,4 @@ limitations under the License.
 -->
 <!-- Do not edit this file with editors other than diagrams.net -->
 <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" 
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd";>
-<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; version="1.1" width="890px" 
height="290px" viewBox="-0.5 -0.5 890 290" content="&lt;mxfile 
host=&quot;app.diagrams.net&quot; modified=&quot;2022-09-30T09:13:42.911Z&quot; 
agent=&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/105.0.0.0 Safari/537.36&quot; 
etag=&quot;jwziFvvco7NdpbSyV7Nh&quot; version=&quot;20.3.7&quot; 
type=&quot;google&quot;&gt;&lt;diagram id=&quot;8C2 [...]
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; version="1.1" width="890px" 
height="290px" viewBox="-0.5 -0.5 890 290" content="&lt;mxfile 
host=&quot;app.diagrams.net&quot; modified=&quot;2023-02-23T11:08:52.799Z&quot; 
agent=&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/110.0.0.0 Safari/537.36&quot; 
etag=&quot;t7LQG5DbOQ_CrP_ZUB0z&quot; version=&quot;20.8.23&quot; 
type=&quot;google&quot;&gt;&lt;diagram id=&quot;8C [...]
\ No newline at end of file

[beam] branch master updated: restructure ml overview website page (#25607)

Reply via email to