[
https://issues.apache.org/jira/browse/BEAM-11544?focusedWorklogId=559138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-559138
]
ASF GitHub Bot logged work on BEAM-11544:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Mar/21 00:42
Start Date: 01/Mar/21 00:42
Worklog Time Spent: 10m
Work Description: rezarokni commented on a change in pull request #13644:
URL: https://github.com/apache/beam/pull/13644#discussion_r584390651
##########
File path: website/www/site/content/en/documentation/patterns/bqml.md
##########
@@ -0,0 +1,98 @@
+---
+title: "BigQuery ML integration"
+---
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# BigQuery ML integration
+
+With the samples on this page we will demonstrate how to integrate models
exported from [BigQuery ML (BQML)](https://cloud.google.com/bigquery-ml/docs)
into your Apache Beam pipeline using [TFX Basic Shared Libraries
(tfx_bsl)](https://github.com/tensorflow/tfx-bsl).
+
+Roughly, the sections below will go through the following steps in more detail:
+
+1. Create and train your BigQuery ML model
+1. Export your BigQuery ML model
+1. Create a transform that uses the brand-new BigQuery ML model
+
+## Create and train your BigQuery ML model
+
+To be able to incorporate your BQML model into an Apache Beam pipeline using
tfx_bsl, it has to be in the [TensorFlow
SavedModel](https://www.tensorflow.org/guide/saved_model) format. An overview
that maps different model types to their export model format for BQML can be
found
[here](https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples).
+
+For the sake of simplicity, we'll be training a (simplified version of the)
logistic regression model in the [BQML quickstart
guide](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start),
using the publicly available Google Analytics sample dataset. An overview of
all models you can create using BQML can be found
[here](https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in).
+
+After creating a BigQuery dataset, you continue to create the model, which is
fully defined in SQL:
+
+```SQL
+CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model`
+OPTIONS(model_type='logistic_reg', input_label_cols=["label"]) AS
+SELECT
+ IF(totals.transactions IS NULL, 0, 1) AS label,
+ IFNULL(geoNetwork.country, "") AS country
+FROM
+ `bigquery-public-data.google_analytics_sample.ga_sessions_*`
+WHERE
+ _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
+```
+
+The model will predict if a purchase will be made given the country of the
visitor on data gathered between 2016-08-01 and 2017-06-30.
+
+## Export your BigQuery ML model
+
+In order to incorporate your model in an Apache Beam pipeline, you will need
to export it. Prerequisites to do so are [installing the `bq` command-line
tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool) and
[creating a Google Cloud Storage
bucket](https://cloud.google.com/storage/docs/creating-buckets) to store your
exported model.
+
+Export the model using the following command:
+
+```bash
+bq extract -m bqml_tutorial.sample_model gs://some/gcs/path
+```
+
+## Create an Apache Beam transform that uses your BigQuery ML model
+
+In this section we will construct an Apache Beam pipeline that will use the
BigQuery ML model we just created and exported. The model can be served using
Google Cloud AI Platform Prediction - for this please refer to the [AI Platform
patterns](https://beam.apache.org/documentation/patterns/ai-platform/). In this
case, we'll be illustrating how to use the tfx_bsl library to do local
predictions (on your Apache Beam workers).
+
+First, the model needs to be downloaded to a local directory where you will be
developing the rest of your pipeline (e.g. to `serving_dir/sample_model/1`).
+
+Then, you can start developing your pipeline like you would normally do. We
will be using the `RunInference` PTransform from the
[tfx_bsl](https://github.com/tensorflow/tfx-bsl) library, and we will point it
to our local directory where the model is stored (see the `model_path` variable
in the code example). The transform takes elements of the type
`tf.train.Example` as inputs and outputs elements of the type
`tensorflow_serving.apis.prediction_log_pb2.PredictionLog`.
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+
+inputs = """
+features {
+ feature { key: "country" value { bytes_list { value: 'Belgium' }}}
+}
+"""
+
+def create_tf_example(json_obj):
+ return text_format.Parse(json_obj, tf.train.Example())
Review comment:
Might be worth looking at the TF api's for this as it maybe more
familiar to folks working with Tensorflow .
https://www.tensorflow.org/tutorials/load_data/tfrecord
##########
File path: website/www/site/content/en/documentation/patterns/bqml.md
##########
@@ -0,0 +1,98 @@
+---
+title: "BigQuery ML integration"
+---
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# BigQuery ML integration
+
+With the samples on this page we will demonstrate how to integrate models
exported from [BigQuery ML (BQML)](https://cloud.google.com/bigquery-ml/docs)
into your Apache Beam pipeline using [TFX Basic Shared Libraries
(tfx_bsl)](https://github.com/tensorflow/tfx-bsl).
+
+Roughly, the sections below will go through the following steps in more detail:
+
+1. Create and train your BigQuery ML model
+1. Export your BigQuery ML model
+1. Create a transform that uses the brand-new BigQuery ML model
+
+## Create and train your BigQuery ML model
+
+To be able to incorporate your BQML model into an Apache Beam pipeline using
tfx_bsl, it has to be in the [TensorFlow
SavedModel](https://www.tensorflow.org/guide/saved_model) format. An overview
that maps different model types to their export model format for BQML can be
found
[here](https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples).
+
+For the sake of simplicity, we'll be training a (simplified version of the)
logistic regression model in the [BQML quickstart
guide](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start),
using the publicly available Google Analytics sample dataset. An overview of
all models you can create using BQML can be found
[here](https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in).
+
+After creating a BigQuery dataset, you continue to create the model, which is
fully defined in SQL:
+
+```SQL
+CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model`
+OPTIONS(model_type='logistic_reg', input_label_cols=["label"]) AS
+SELECT
+ IF(totals.transactions IS NULL, 0, 1) AS label,
+ IFNULL(geoNetwork.country, "") AS country
+FROM
+ `bigquery-public-data.google_analytics_sample.ga_sessions_*`
+WHERE
+ _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
Review comment:
Does this table not have time partition enabled?
##########
File path: website/www/site/content/en/documentation/patterns/bqml.md
##########
@@ -0,0 +1,98 @@
+---
+title: "BigQuery ML integration"
+---
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# BigQuery ML integration
+
+With the samples on this page we will demonstrate how to integrate models
exported from [BigQuery ML (BQML)](https://cloud.google.com/bigquery-ml/docs)
into your Apache Beam pipeline using [TFX Basic Shared Libraries
(tfx_bsl)](https://github.com/tensorflow/tfx-bsl).
+
+Roughly, the sections below will go through the following steps in more detail:
+
+1. Create and train your BigQuery ML model
+1. Export your BigQuery ML model
+1. Create a transform that uses the brand-new BigQuery ML model
+
+## Create and train your BigQuery ML model
+
+To be able to incorporate your BQML model into an Apache Beam pipeline using
tfx_bsl, it has to be in the [TensorFlow
SavedModel](https://www.tensorflow.org/guide/saved_model) format. An overview
that maps different model types to their export model format for BQML can be
found
[here](https://cloud.google.com/bigquery-ml/docs/exporting-models#export_model_formats_and_samples).
+
+For the sake of simplicity, we'll be training a (simplified version of the)
logistic regression model in the [BQML quickstart
guide](https://cloud.google.com/bigquery-ml/docs/bigqueryml-web-ui-start),
using the publicly available Google Analytics sample dataset. An overview of
all models you can create using BQML can be found
[here](https://cloud.google.com/bigquery-ml/docs/introduction#supported_models_in).
+
+After creating a BigQuery dataset, you continue to create the model, which is
fully defined in SQL:
+
+```SQL
+CREATE MODEL IF NOT EXISTS `bqml_tutorial.sample_model`
+OPTIONS(model_type='logistic_reg', input_label_cols=["label"]) AS
+SELECT
+ IF(totals.transactions IS NULL, 0, 1) AS label,
+ IFNULL(geoNetwork.country, "") AS country
+FROM
+ `bigquery-public-data.google_analytics_sample.ga_sessions_*`
+WHERE
+ _TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
+```
+
+The model will predict if a purchase will be made given the country of the
visitor on data gathered between 2016-08-01 and 2017-06-30.
+
+## Export your BigQuery ML model
+
+In order to incorporate your model in an Apache Beam pipeline, you will need
to export it. Prerequisites to do so are [installing the `bq` command-line
tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool) and
[creating a Google Cloud Storage
bucket](https://cloud.google.com/storage/docs/creating-buckets) to store your
exported model.
+
+Export the model using the following command:
+
+```bash
+bq extract -m bqml_tutorial.sample_model gs://some/gcs/path
+```
+
+## Create an Apache Beam transform that uses your BigQuery ML model
+
+In this section we will construct an Apache Beam pipeline that will use the
BigQuery ML model we just created and exported. The model can be served using
Google Cloud AI Platform Prediction - for this please refer to the [AI Platform
patterns](https://beam.apache.org/documentation/patterns/ai-platform/). In this
case, we'll be illustrating how to use the tfx_bsl library to do local
predictions (on your Apache Beam workers).
+
+First, the model needs to be downloaded to a local directory where you will be
developing the rest of your pipeline (e.g. to `serving_dir/sample_model/1`).
+
+Then, you can start developing your pipeline like you would normally do. We
will be using the `RunInference` PTransform from the
[tfx_bsl](https://github.com/tensorflow/tfx-bsl) library, and we will point it
to our local directory where the model is stored (see the `model_path` variable
in the code example). The transform takes elements of the type
`tf.train.Example` as inputs and outputs elements of the type
`tensorflow_serving.apis.prediction_log_pb2.PredictionLog`.
+
+```python
+import apache_beam
+import tensorflow as tf
+from google.protobuf import text_format
+from tfx_bsl.beam import run_inference
+from tfx_bsl.public.beam import RunInference
+from tfx_bsl.public.proto import model_spec_pb2
+
+
+inputs = """
+features {
+ feature { key: "country" value { bytes_list { value: 'Belgium' }}}
+}
+"""
+
+def create_tf_example(json_obj):
+ return text_format.Parse(json_obj, tf.train.Example())
+
+model_path = "serving_dir/sample_model/1"
+
+with beam.Pipeline() as p:
+ res = (
+ p
+ | beam.Create([
+ create_tf_example(inputs)
+ ])
+ | RunInference(
+ model_spec_pb2.InferenceSpecType(
+ saved_model_spec=model_spec_pb2.SavedModelSpec(
+ model_path=model_path,
+ signature_name=['serving_default'])))
+```
Review comment:
It would be useful to show how to parse the results coming from
RunInference which is a predict log.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 559138)
Time Spent: 1.5h (was: 1h 20m)
> BQML pattern
> ------------
>
> Key: BEAM-11544
> URL: https://issues.apache.org/jira/browse/BEAM-11544
> Project: Beam
> Issue Type: Bug
> Components: website
> Reporter: Matthias Baetens
> Priority: P3
> Labels: pipeline-patterns
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Develop a pattern that trains a BQML model and integrates it into an Apache
> Beam pipeline using local inference.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)