rszper commented on code in PR #30689:
URL: https://github.com/apache/beam/pull/30689#discussion_r1535824242
##########
examples/notebooks/beam-ml/vertex_ai_feature_store_enrichment.ipynb:
##########
@@ -0,0 +1,2601 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "fFjof1NgAJwu",
+ "cellView": "form"
+ },
+ "outputs": [],
+ "source": [
+ "# @title ###### Licensed to the Apache Software Foundation (ASF),
Version 2.0 (the \"License\")\n",
+ "\n",
+ "# Licensed to the Apache Software Foundation (ASF) under one\n",
+ "# or more contributor license agreements. See the NOTICE file\n",
+ "# distributed with this work for additional information\n",
+ "# regarding copyright ownership. The ASF licenses this file\n",
+ "# to you under the Apache License, Version 2.0 (the\n",
+ "# \"License\"); you may not use this file except in compliance\n",
+ "# with the License. You may obtain a copy of the License at\n",
+ "#\n",
+ "# http://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing,\n",
+ "# software distributed under the License is distributed on an\n",
+ "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ "# KIND, either express or implied. See the License for the\n",
+ "# specific language governing permissions and limitations\n",
+ "# under the License"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A8xNRyZMW1yK"
+ },
+ "source": [
+ "# Use Apache Beam and Vertex AI Feature Store to enrich data\n",
+ "\n",
+ "<table align=\"left\">\n",
+ " <td>\n",
+ " <a target=\"_blank\"
href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/vertex_ai_feature_store_enrichment.ipynb\"><img
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\"
/>Run in Google Colab</a>\n",
+ " </td>\n",
+ " <td>\n",
+ " <a target=\"_blank\"
href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/vertex_ai_feature_store_enrichment.ipynb\"><img
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\"
/>View source on GitHub</a>\n",
+ " </td>\n",
+ "</table>\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HrCtxslBGK8Z"
+ },
+ "source": [
+ "This notebook shows how to enrich data by using the Apache Beam
[enrichment
transform](https://beam.apache.org/documentation/transforms/python/elementwise/enrichment/)
with [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs). The
enrichment transform is a turnkey transform in Apache Beam that lets you enrich
data using a key-value lookup. This transform has the following features:\n",
+ "\n",
+ "- The transform has a built-in Apache Beam handler that interacts
with Vertex AI to get precomputed feature values.\n",
+ "- The enrichment transform uses client-side throttling to manage rate
limiting the requests.\n",
+ "- Optionally, you can configure a Redis cache to improve
efficiency.\n",
+ "\n",
+ "As of version 2.55.0, [online feature
serving](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview#online_serving)
via Bigtable online serving and Vertex AI Feature Store (Legacy) method is
supported. This notebook demonstrates how to use the Bigtable online serving
approach with enrichment transform in an Apache Beam pipeline."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "This notebook demonstrates the following ecommerce product
recommendation use case based on a BigQuery public dataset -
[thelook-ecommerce](https://pantheon.corp.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce):\n",
+ "\n",
+ "A stream of online transactions from
[Pub/Sub](https://cloud.google.com/pubsub/docs/guides) contains the following
fields: `product_id`, `user_id`, and `sale_price`. A machine learning model is
deployed on Vertex AI based on features - `product_id`, `user_id`,
`sale_price`, `age`, `gender`, `state`, and `country`. These features values
are precomputed and stored in the Vertex AI Online Feature Store. This
precomputed data is used to enrich the incoming stream of events from Pub/Sub
with demographic information. The enriched data is sent to the Vertex AI model
for online prediction, which predicts the product recommendation for the user."
+ ],
+ "metadata": {
+ "id": "ltn5zrBiGS9C"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gVCtGOKTHMm4"
+ },
+ "source": [
+ "## Before you begin\n",
+ "Set up your environment and download dependencies."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YDHPlMjZRuY0"
+ },
+ "source": [
+ "### Install Apache Beam\n",
+ "To use the enrichment transform with the built-in Vertex AI handler,
install the Apache Beam SDK version 2.55.0 or later."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "jBakpNZnAhqk",
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "!pip install apache_beam[interactive,gcp]==2.55.0 --quiet\n",
+ "!pip install redis\n",
+ "\n",
+ "# Use tensorflow 2.13.0 since it is the latest version that has the
prebuilt\n",
+ "# container image for Vertex AI model deployment.\n",
+ "# See
https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers#tensorflow\n",
+ "!pip install tensorflow==2.13"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import json\n",
+ "import math\n",
+ "import os\n",
+ "import time\n",
+ "\n",
+ "from typing import Any\n",
+ "from typing import Dict\n",
+ "\n",
+ "import pandas as pd\n",
+ "from google.cloud import aiplatform\n",
+ "from google.cloud import pubsub_v1\n",
+ "from google.cloud import bigquery\n",
+ "from google.cloud import storage\n",
+ "from google.cloud.aiplatform_v1 import
FeatureOnlineStoreAdminServiceClient\n",
+ "from google.cloud.aiplatform_v1 import
FeatureRegistryServiceClient\n",
+ "from google.cloud.aiplatform_v1.types import feature_view as
feature_view_pb2\n",
+ "from google.cloud.aiplatform_v1.types import \\\n",
+ " feature_online_store as feature_online_store_pb2\n",
+ "from google.cloud.aiplatform_v1.types import \\\n",
+ " feature_online_store_admin_service as \\\n",
+ " feature_online_store_admin_service_pb2\n",
+ "\n",
+ "import apache_beam as beam\n",
+ "import tensorflow as tf\n",
+ "import apache_beam.runners.interactive.interactive_beam as ib\n",
+ "from apache_beam.ml.inference.base import RunInference\n",
+ "from apache_beam.ml.inference.vertex_ai_inference import
VertexAIModelHandlerJSON\n",
+ "from apache_beam.options import pipeline_options\n",
+ "from apache_beam.runners.interactive.interactive_runner import
InteractiveRunner\n",
+ "from apache_beam.transforms.enrichment import Enrichment\n",
+ "from
apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store import
VertexAIFeatureStoreEnrichmentHandler\n",
+ "from tensorflow import keras\n",
+ "from tensorflow.keras import layers"
+ ],
+ "metadata": {
+ "id": "SiJii48A2Rnb"
+ },
+ "execution_count": 1,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "X80jy3FqHjK4"
+ },
+ "source": [
+ "### Authenticate with Google Cloud\n",
+ "This notebook reads data from Pub/Sub and Vertex AI. To use your
Google Cloud account, authenticate this notebook."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "Kz9sccyGBqz3"
+ },
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "auth.authenticate_user()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Replace `<PROJECT_ID>` and `<LOCATION>` with the appropriate values
for your Google Cloud account."
+ ],
+ "metadata": {
+ "id": "nAmGgUMt48o9"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "id": "wEXucyi2liij"
+ },
+ "outputs": [],
+ "source": [
+ "PROJECT_ID = \"<PROJECT_ID>\"\n",
+ "LOCATION = \"<LOCATION>\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Train and deploy the model to Vertex AI\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "RpqZFfFfA_Dt"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Fetcht the training data from BigQuery public dataset
[thelook-ecommerce](https://pantheon.corp.google.com/marketplace/product/bigquery-public-data/thelook-ecommerce)."
+ ],
+ "metadata": {
+ "id": "8cUpV7mkB_xE"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "train_data_query = \"\"\"\n",
+ "WITH\n",
+ " order_items AS (\n",
+ " SELECT cast(user_id as string) AS user_id,\n",
+ " product_id,\n",
+ " sale_price,\n",
+ " FROM `bigquery-public-data.thelook_ecommerce.order_items`),\n",
+ " users AS (\n",
+ " SELECT cast(id as string) AS user_id,\n",
+ " age,\n",
+ " lower(gender) as gender,\n",
+ " lower(state) as state,\n",
+ " lower(country) as country,\n",
+ " FROM `bigquery-public-data.thelook_ecommerce.users`)\n",
+ "SELECT *\n",
+ "FROM order_items\n",
+ "LEFT OUTER JOIN users\n",
+ "USING (user_id)\n",
+ "\"\"\"\n",
+ "\n",
+ "client = bigquery.Client(project=PROJECT_ID)\n",
+ "train_data =
client.query(train_data_query).result().to_dataframe()\n",
+ "train_data.head()"
+ ],
+ "metadata": {
+ "id": "TpxDHGObBEsj",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 206
+ },
+ "outputId": "4f7afe32-a72b-40d3-b9ae-cc999ad104b8"
+ },
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " user_id product_id sale_price age gender state
country\n",
+ "0 68717 14235 0.02 43 f sachsen
germany\n",
+ "1 59866 28700 1.50 17 m chongqing
china\n",
+ "2 38322 14202 1.50 47 f missouri
united states\n",
+ "3 7839 28700 1.50 64 m mato grosso
brasil\n",
+ "4 40877 28700 1.50 68 m sergipe
brasil"
+ ],
+ "text/html": [
+ "\n",
+ " <div id=\"df-d4074964-50d7-4425-a22a-70b0e5045dd1\"
class=\"colab-df-container\">\n",
+ " <div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>user_id</th>\n",
+ " <th>product_id</th>\n",
+ " <th>sale_price</th>\n",
+ " <th>age</th>\n",
+ " <th>gender</th>\n",
+ " <th>state</th>\n",
+ " <th>country</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>68717</td>\n",
+ " <td>14235</td>\n",
+ " <td>0.02</td>\n",
+ " <td>43</td>\n",
+ " <td>f</td>\n",
+ " <td>sachsen</td>\n",
+ " <td>germany</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>59866</td>\n",
+ " <td>28700</td>\n",
+ " <td>1.50</td>\n",
+ " <td>17</td>\n",
+ " <td>m</td>\n",
+ " <td>chongqing</td>\n",
+ " <td>china</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>38322</td>\n",
+ " <td>14202</td>\n",
+ " <td>1.50</td>\n",
+ " <td>47</td>\n",
+ " <td>f</td>\n",
+ " <td>missouri</td>\n",
+ " <td>united states</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>7839</td>\n",
+ " <td>28700</td>\n",
+ " <td>1.50</td>\n",
+ " <td>64</td>\n",
+ " <td>m</td>\n",
+ " <td>mato grosso</td>\n",
+ " <td>brasil</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>40877</td>\n",
+ " <td>28700</td>\n",
+ " <td>1.50</td>\n",
+ " <td>68</td>\n",
+ " <td>m</td>\n",
+ " <td>sergipe</td>\n",
+ " <td>brasil</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>\n",
+ " <div class=\"colab-df-buttons\">\n",
+ "\n",
+ " <div class=\"colab-df-container\">\n",
+ " <button class=\"colab-df-convert\"
onclick=\"convertToInteractive('df-d4074964-50d7-4425-a22a-70b0e5045dd1')\"\n",
+ " title=\"Convert this dataframe to an interactive
table.\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"
viewBox=\"0 -960 960 960\">\n",
+ " <path
d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220
220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440
0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+ " </svg>\n",
+ " </button>\n",
+ "\n",
+ " <style>\n",
+ " .colab-df-container {\n",
+ " display:flex;\n",
+ " gap: 12px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert {\n",
+ " background-color: #E8F0FE;\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: #1967D2;\n",
+ " height: 32px;\n",
+ " padding: 0 0 0 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert:hover {\n",
+ " background-color: #E2EBFA;\n",
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px
3px 1px rgba(60, 64, 67, 0.15);\n",
+ " fill: #174EA6;\n",
+ " }\n",
+ "\n",
+ " .colab-df-buttons div {\n",
+ " margin-bottom: 4px;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert {\n",
+ " background-color: #3B4455;\n",
+ " fill: #D2E3FC;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert:hover {\n",
+ " background-color: #434B5C;\n",
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+ " fill: #FFFFFF;\n",
+ " }\n",
+ " </style>\n",
+ "\n",
+ " <script>\n",
+ " const buttonEl =\n",
+ "
document.querySelector('#df-d4074964-50d7-4425-a22a-70b0e5045dd1
button.colab-df-convert');\n",
+ " buttonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ "\n",
+ " async function convertToInteractive(key) {\n",
+ " const element =
document.querySelector('#df-d4074964-50d7-4425-a22a-70b0e5045dd1');\n",
+ " const dataTable =\n",
+ " await
google.colab.kernel.invokeFunction('convertToInteractive',\n",
+ " [key],
{});\n",
+ " if (!dataTable) return;\n",
+ "\n",
+ " const docLinkHtml = 'Like what you see? Visit the '
+\n",
+ " '<a target=\"_blank\"
href=https://colab.research.google.com/notebooks/data_table.ipynb>data table
notebook</a>'\n",
+ " + ' to learn more about interactive tables.';\n",
+ " element.innerHTML = '';\n",
+ " dataTable['output_type'] = 'display_data';\n",
+ " await google.colab.output.renderOutput(dataTable,
element);\n",
+ " const docLink = document.createElement('div');\n",
+ " docLink.innerHTML = docLinkHtml;\n",
+ " element.appendChild(docLink);\n",
+ " }\n",
+ " </script>\n",
+ " </div>\n",
+ "\n",
+ "\n",
+ "<div id=\"df-dcacc3b1-58b4-4ba0-96e1-2bbe81f01ffe\">\n",
+ " <button class=\"colab-df-quickchart\"
onclick=\"quickchart('df-dcacc3b1-58b4-4ba0-96e1-2bbe81f01ffe')\"\n",
+ " title=\"Suggest charts\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ "<svg xmlns=\"http://www.w3.org/2000/svg\"
height=\"24px\"viewBox=\"0 0 24 24\"\n",
+ " width=\"24px\">\n",
+ " <g>\n",
+ " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2
2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4
0h-2v-4h2v4z\"/>\n",
+ " </g>\n",
+ "</svg>\n",
+ " </button>\n",
+ "\n",
+ "<style>\n",
+ " .colab-df-quickchart {\n",
+ " --bg-color: #E8F0FE;\n",
+ " --fill-color: #1967D2;\n",
+ " --hover-bg-color: #E2EBFA;\n",
+ " --hover-fill-color: #174EA6;\n",
+ " --disabled-fill-color: #AAA;\n",
+ " --disabled-bg-color: #DDD;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-quickchart {\n",
+ " --bg-color: #3B4455;\n",
+ " --fill-color: #D2E3FC;\n",
+ " --hover-bg-color: #434B5C;\n",
+ " --hover-fill-color: #FFFFFF;\n",
+ " --disabled-bg-color: #3B4455;\n",
+ " --disabled-fill-color: #666;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart {\n",
+ " background-color: var(--bg-color);\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: var(--fill-color);\n",
+ " height: 32px;\n",
+ " padding: 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart:hover {\n",
+ " background-color: var(--hover-bg-color);\n",
+ " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px
rgba(60, 64, 67, 0.15);\n",
+ " fill: var(--button-hover-fill-color);\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart-complete:disabled,\n",
+ " .colab-df-quickchart-complete:disabled:hover {\n",
+ " background-color: var(--disabled-bg-color);\n",
+ " fill: var(--disabled-fill-color);\n",
+ " box-shadow: none;\n",
+ " }\n",
+ "\n",
+ " .colab-df-spinner {\n",
+ " border: 2px solid var(--fill-color);\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " animation:\n",
+ " spin 1s steps(1) infinite;\n",
+ " }\n",
+ "\n",
+ " @keyframes spin {\n",
+ " 0% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " border-left-color: var(--fill-color);\n",
+ " }\n",
+ " 20% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 30% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 40% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 60% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 80% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " 90% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " }\n",
+ "</style>\n",
+ "\n",
+ " <script>\n",
+ " async function quickchart(key) {\n",
+ " const quickchartButtonEl =\n",
+ " document.querySelector('#' + key + ' button');\n",
+ " quickchartButtonEl.disabled = true; // To prevent
multiple clicks.\n",
+ " quickchartButtonEl.classList.add('colab-df-spinner');\n",
+ " try {\n",
+ " const charts = await
google.colab.kernel.invokeFunction(\n",
+ " 'suggestCharts', [key], {});\n",
+ " } catch (error) {\n",
+ " console.error('Error during call to suggestCharts:',
error);\n",
+ " }\n",
+ "
quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+ "
quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+ " }\n",
+ " (() => {\n",
+ " let quickchartButtonEl =\n",
+ "
document.querySelector('#df-dcacc3b1-58b4-4ba0-96e1-2bbe81f01ffe button');\n",
+ " quickchartButtonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ " })();\n",
+ " </script>\n",
+ "</div>\n",
+ "\n",
+ " </div>\n",
+ " </div>\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "train_data"
+ }
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Create a prediction dataframe that contains the `product_id` a user
should be recommended to buy. Also, preprocess the data for columns that
contain the categorical values."
+ ],
+ "metadata": {
+ "id": "OkYcJPC0THoV"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# create a prediction dataframe\n",
+ "prediction_data = train_data['product_id'].sample(frac=1,
replace=True)\n",
+ "\n",
+ "# preprocess data to handle categorical values\n",
+ "train_data['gender'] = pd.factorize(train_data['gender'])[0]\n",
+ "train_data['state'] = pd.factorize(train_data['state'])[0]\n",
+ "train_data['country'] = pd.factorize(train_data['country'])[0]\n",
+ "train_data.head()"
+ ],
+ "metadata": {
+ "id": "ej6jCkMF0B29",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 206
+ },
+ "outputId": "44cfd7f1-0c7c-40a8-813f-02af86a6f788"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " user_id product_id sale_price age gender state
country\n",
+ "0 68717 14235 0.02 43 0 0
0\n",
+ "1 59866 28700 1.50 17 1 1
1\n",
+ "2 38322 14202 1.50 47 0 2
2\n",
+ "3 7839 28700 1.50 64 1 3
3\n",
+ "4 40877 28700 1.50 68 1 4 3"
+ ],
+ "text/html": [
+ "\n",
+ " <div id=\"df-6c868129-b0f4-49db-b83e-5a2914f6cf1a\"
class=\"colab-df-container\">\n",
+ " <div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>user_id</th>\n",
+ " <th>product_id</th>\n",
+ " <th>sale_price</th>\n",
+ " <th>age</th>\n",
+ " <th>gender</th>\n",
+ " <th>state</th>\n",
+ " <th>country</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>68717</td>\n",
+ " <td>14235</td>\n",
+ " <td>0.02</td>\n",
+ " <td>43</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>59866</td>\n",
+ " <td>28700</td>\n",
+ " <td>1.50</td>\n",
+ " <td>17</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>38322</td>\n",
+ " <td>14202</td>\n",
+ " <td>1.50</td>\n",
+ " <td>47</td>\n",
+ " <td>0</td>\n",
+ " <td>2</td>\n",
+ " <td>2</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>7839</td>\n",
+ " <td>28700</td>\n",
+ " <td>1.50</td>\n",
+ " <td>64</td>\n",
+ " <td>1</td>\n",
+ " <td>3</td>\n",
+ " <td>3</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>40877</td>\n",
+ " <td>28700</td>\n",
+ " <td>1.50</td>\n",
+ " <td>68</td>\n",
+ " <td>1</td>\n",
+ " <td>4</td>\n",
+ " <td>3</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>\n",
+ " <div class=\"colab-df-buttons\">\n",
+ "\n",
+ " <div class=\"colab-df-container\">\n",
+ " <button class=\"colab-df-convert\"
onclick=\"convertToInteractive('df-6c868129-b0f4-49db-b83e-5a2914f6cf1a')\"\n",
+ " title=\"Convert this dataframe to an interactive
table.\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"
viewBox=\"0 -960 960 960\">\n",
+ " <path
d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220
220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440
0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+ " </svg>\n",
+ " </button>\n",
+ "\n",
+ " <style>\n",
+ " .colab-df-container {\n",
+ " display:flex;\n",
+ " gap: 12px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert {\n",
+ " background-color: #E8F0FE;\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: #1967D2;\n",
+ " height: 32px;\n",
+ " padding: 0 0 0 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert:hover {\n",
+ " background-color: #E2EBFA;\n",
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px
3px 1px rgba(60, 64, 67, 0.15);\n",
+ " fill: #174EA6;\n",
+ " }\n",
+ "\n",
+ " .colab-df-buttons div {\n",
+ " margin-bottom: 4px;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert {\n",
+ " background-color: #3B4455;\n",
+ " fill: #D2E3FC;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert:hover {\n",
+ " background-color: #434B5C;\n",
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+ " fill: #FFFFFF;\n",
+ " }\n",
+ " </style>\n",
+ "\n",
+ " <script>\n",
+ " const buttonEl =\n",
+ "
document.querySelector('#df-6c868129-b0f4-49db-b83e-5a2914f6cf1a
button.colab-df-convert');\n",
+ " buttonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ "\n",
+ " async function convertToInteractive(key) {\n",
+ " const element =
document.querySelector('#df-6c868129-b0f4-49db-b83e-5a2914f6cf1a');\n",
+ " const dataTable =\n",
+ " await
google.colab.kernel.invokeFunction('convertToInteractive',\n",
+ " [key],
{});\n",
+ " if (!dataTable) return;\n",
+ "\n",
+ " const docLinkHtml = 'Like what you see? Visit the '
+\n",
+ " '<a target=\"_blank\"
href=https://colab.research.google.com/notebooks/data_table.ipynb>data table
notebook</a>'\n",
+ " + ' to learn more about interactive tables.';\n",
+ " element.innerHTML = '';\n",
+ " dataTable['output_type'] = 'display_data';\n",
+ " await google.colab.output.renderOutput(dataTable,
element);\n",
+ " const docLink = document.createElement('div');\n",
+ " docLink.innerHTML = docLinkHtml;\n",
+ " element.appendChild(docLink);\n",
+ " }\n",
+ " </script>\n",
+ " </div>\n",
+ "\n",
+ "\n",
+ "<div id=\"df-6a7655a2-dc94-4e64-80be-c525111b9a63\">\n",
+ " <button class=\"colab-df-quickchart\"
onclick=\"quickchart('df-6a7655a2-dc94-4e64-80be-c525111b9a63')\"\n",
+ " title=\"Suggest charts\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ "<svg xmlns=\"http://www.w3.org/2000/svg\"
height=\"24px\"viewBox=\"0 0 24 24\"\n",
+ " width=\"24px\">\n",
+ " <g>\n",
+ " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2
2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4
0h-2v-4h2v4z\"/>\n",
+ " </g>\n",
+ "</svg>\n",
+ " </button>\n",
+ "\n",
+ "<style>\n",
+ " .colab-df-quickchart {\n",
+ " --bg-color: #E8F0FE;\n",
+ " --fill-color: #1967D2;\n",
+ " --hover-bg-color: #E2EBFA;\n",
+ " --hover-fill-color: #174EA6;\n",
+ " --disabled-fill-color: #AAA;\n",
+ " --disabled-bg-color: #DDD;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-quickchart {\n",
+ " --bg-color: #3B4455;\n",
+ " --fill-color: #D2E3FC;\n",
+ " --hover-bg-color: #434B5C;\n",
+ " --hover-fill-color: #FFFFFF;\n",
+ " --disabled-bg-color: #3B4455;\n",
+ " --disabled-fill-color: #666;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart {\n",
+ " background-color: var(--bg-color);\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: var(--fill-color);\n",
+ " height: 32px;\n",
+ " padding: 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart:hover {\n",
+ " background-color: var(--hover-bg-color);\n",
+ " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px
rgba(60, 64, 67, 0.15);\n",
+ " fill: var(--button-hover-fill-color);\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart-complete:disabled,\n",
+ " .colab-df-quickchart-complete:disabled:hover {\n",
+ " background-color: var(--disabled-bg-color);\n",
+ " fill: var(--disabled-fill-color);\n",
+ " box-shadow: none;\n",
+ " }\n",
+ "\n",
+ " .colab-df-spinner {\n",
+ " border: 2px solid var(--fill-color);\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " animation:\n",
+ " spin 1s steps(1) infinite;\n",
+ " }\n",
+ "\n",
+ " @keyframes spin {\n",
+ " 0% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " border-left-color: var(--fill-color);\n",
+ " }\n",
+ " 20% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 30% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 40% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 60% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 80% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " 90% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " }\n",
+ "</style>\n",
+ "\n",
+ " <script>\n",
+ " async function quickchart(key) {\n",
+ " const quickchartButtonEl =\n",
+ " document.querySelector('#' + key + ' button');\n",
+ " quickchartButtonEl.disabled = true; // To prevent
multiple clicks.\n",
+ " quickchartButtonEl.classList.add('colab-df-spinner');\n",
+ " try {\n",
+ " const charts = await
google.colab.kernel.invokeFunction(\n",
+ " 'suggestCharts', [key], {});\n",
+ " } catch (error) {\n",
+ " console.error('Error during call to suggestCharts:',
error);\n",
+ " }\n",
+ "
quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+ "
quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+ " }\n",
+ " (() => {\n",
+ " let quickchartButtonEl =\n",
+ "
document.querySelector('#df-6a7655a2-dc94-4e64-80be-c525111b9a63 button');\n",
+ " quickchartButtonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ " })();\n",
+ " </script>\n",
+ "</div>\n",
+ "\n",
+ " </div>\n",
+ " </div>\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "train_data"
+ }
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Convert dataframe to tensors."
+ ],
+ "metadata": {
+ "id": "7ffoopdQVk8W"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "train_tensors = tf.convert_to_tensor(train_data.values,
dtype=tf.float32)\n",
+ "prediction_tensors = tf.convert_to_tensor(prediction_data.values,
dtype=tf.float32)"
+ ],
+ "metadata": {
+ "id": "vmHH26KDVkuf"
+ },
+ "execution_count": 6,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Based on this data, build a simple neural network model using
tensorflow."
+ ],
+ "metadata": {
+ "id": "CRoW8ElNV4I9"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "inputs = layers.Input(shape=(7,))\n",
+ "x = layers.Dense(7, activation='relu')(inputs)\n",
+ "x = layers.Dense(14, activation='relu')(x)\n",
+ "outputs = layers.Dense(1)(x)\n",
+ "\n",
+ "model = keras.Model(inputs=inputs, outputs=outputs)"
+ ],
+ "metadata": {
+ "id": "EKrb13wsV3m4"
+ },
+ "execution_count": 7,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Train the model (Takes approx 1m 30 secs for 1 epoch)."
+ ],
+ "metadata": {
+ "id": "Duv4qzmEWFSZ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "EPOCHS = 1"
+ ],
+ "metadata": {
+ "id": "bHg1kcvnk7Xb"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "model.compile(optimizer='adam', loss='mse')\n",
+ "model.fit(train_tensors, prediction_tensors, epochs=EPOCHS)"
+ ],
+ "metadata": {
+ "id": "4GrDp5_WWGZv"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Save the model to the `MODEL_PATH` variable.\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "_rJYv8fFFPYb"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# create a new directory for saving the model.\n",
+ "!mkdir model\n",
+ "\n",
+ "# save the model.\n",
+ "MODEL_PATH = './model/'\n",
+ "tf.saved_model.save(model, MODEL_PATH)"
+ ],
+ "metadata": {
+ "id": "W4t260o9FURP"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Stage the locally saved model to GCS bucket. We will use this GCS
bucket to deploy the model to Vertex AI."
+ ],
+ "metadata": {
+ "id": "hsJOxFTWj6JX"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "GCS_BUCKET = 'GCS_BUCKET_NAME'\n",
+ "GCS_BUCKET_DIRECTORY = 'GCS_BUCKET_DIRECTORY'"
+ ],
+ "metadata": {
+ "id": "WQp1e_JgllBW"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# stage to GCS bucket\n",
+ "import glob\n",
+ "from google.cloud import storage\n",
+ "client = storage.Client(project=PROJECT_ID)\n",
+ "bucket = client.bucket(GCS_BUCKET)\n",
+ "\n",
+ "def upload_model_to_gcs(model_path, bucket, gcs_model_dir):\n",
+ " for file in glob.glob(model_path + '/**', recursive=True):\n",
+ " if os.path.isfile(file):\n",
+ " path = os.path.join(gcs_model_dir, file[1 +
len(model_path.rstrip(\"/\")):])\n",
+ " blob = bucket.blob(path)\n",
+ " blob.upload_from_filename(file)\n",
+ "\n",
+ "\n",
+ "upload_model_to_gcs(MODEL_PATH, bucket, GCS_BUCKET_DIRECTORY)"
+ ],
+ "metadata": {
+ "id": "yiXRXV89e8_Y"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Upload the model saved in GCS bucket to Vertex AI Model Registry."
+ ],
+ "metadata": {
+ "id": "O72h009kl_-L"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "model_display_name = 'vertex-ai-enrichment'"
+ ],
+ "metadata": {
+ "id": "bKN5pUD3uImj"
+ },
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "aiplatform.init(project=PROJECT_ID, location=LOCATION)\n",
+ "model = aiplatform.Model.upload(\n",
+ " display_name = model_display_name,\n",
+ " description='Model used in the vertex ai enrichment notebook.',\n",
+ " artifact_uri=\"gs://\" + GCS_BUCKET + \"/\" +
GCS_BUCKET_DIRECTORY,\n",
+ "
serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-13:latest',\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "Pp3Jca9GfpEj"
+ },
+ "execution_count": 5,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Create an endpoint on Vertex AI."
+ ],
+ "metadata": {
+ "id": "ms_KqSIbZkLP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "endpoint = aiplatform.Endpoint.create(display_name =
model_display_name,\n",
+ " project = PROJECT_ID,\n",
+ " location = LOCATION)"
+ ],
+ "metadata": {
+ "id": "YKKzRrN6czni",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "bfd954c0-8267-476d-dd0c-15e612ae0cc1"
+ },
+ "execution_count": 30,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "INFO:google.cloud.aiplatform.models:Creating Endpoint\n",
+ "INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO:
projects/927334603519/locations/us-central1/endpoints/5369128583685996544/operations/3775856005049483264\n",
+ "INFO:google.cloud.aiplatform.models:Endpoint created. Resource
name:
projects/927334603519/locations/us-central1/endpoints/5369128583685996544\n",
+ "INFO:google.cloud.aiplatform.models:To use this Endpoint in
another session:\n",
+ "INFO:google.cloud.aiplatform.models:endpoint =
aiplatform.Endpoint('projects/927334603519/locations/us-central1/endpoints/5369128583685996544')\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Deploy model to the Vertex AI endpoint.\n",
+ "\n",
+ "**Note:** This step is a long running operation (LRO). It may take
more than 5 minutes to complete depending on the size of the model."
+ ],
+ "metadata": {
+ "id": "WgSpy0J3oBFP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "deployed_model_display_name = 'vertexai-enrichment-notebook'\n",
+ "model.deploy(endpoint = endpoint,\n",
+ " deployed_model_display_name =
deployed_model_display_name,\n",
+ " machine_type = 'n1-standard-2')"
+ ],
+ "metadata": {
+ "id": "FLQtMVQjnsls"
+ },
+ "execution_count": 6,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "model_endpoint_id =
aiplatform.Endpoint.list(filter=f'display_name=\"{deployed_model_display_name}\"')[0].name\n",
+ "print(model_endpoint_id)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "3JjIwzZouAi5",
+ "outputId": "ffb1fb74-365a-426b-d60d-d3910c116e10"
+ },
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "8125472293125095424\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Set up the Vertex AI Feature Store for online serving\n"
+ ],
+ "metadata": {
+ "id": "ouMQZ4sC4zuO"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Set up feature data in BigQuery."
+ ],
+ "metadata": {
+ "id": "B1Bk7XP7190z"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "feature_store_query = \"\"\"\n",
+ "SELECT cast(id as string) AS user_id,\n",
+ " age,\n",
+ " lower(gender) as gender,\n",
+ " lower(state) as state,\n",
+ " lower(country) as country,\n",
+ "FROM `bigquery-public-data.thelook_ecommerce.users`\n",
+ "\"\"\"\n",
+ "\n",
+ "# Fetch feature values from BigQuery\n",
+ "client = bigquery.Client(project=PROJECT_ID)\n",
+ "data = client.query(feature_store_query).result().to_dataframe()\n",
+ "\n",
+ "# Convert feature values to string type. This helps in creating
tensor\n",
+ "# of these values for inference that requires same data type.\n",
+ "data['gender'] = pd.factorize(data['gender'])[0]\n",
+ "data['gender'] = data['gender'].astype(str)\n",
+ "data['state'] = pd.factorize(data['state'])[0]\n",
+ "data['state'] = data['state'].astype(str)\n",
+ "data['country'] = pd.factorize(data['country'])[0]\n",
+ "data['country'] = data['country'].astype(str)\n",
+ "data.head()"
+ ],
+ "metadata": {
+ "id": "4Qkysu_g19c_",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 206
+ },
+ "outputId": "187ee1e8-07c9-457a-abbe-fab724d997ce"
+ },
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " user_id age gender state country\n",
+ "0 7723 12 0 0 0\n",
+ "1 93041 12 0 1 1\n",
+ "2 45741 12 1 1 1\n",
+ "3 16718 12 0 1 1\n",
+ "4 70137 12 1 1 1"
+ ],
+ "text/html": [
+ "\n",
+ " <div id=\"df-8aa66ce8-dfee-43c9-bbea-f2b02e8a007e\"
class=\"colab-df-container\">\n",
+ " <div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>user_id</th>\n",
+ " <th>age</th>\n",
+ " <th>gender</th>\n",
+ " <th>state</th>\n",
+ " <th>country</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>7723</td>\n",
+ " <td>12</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " <td>0</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>93041</td>\n",
+ " <td>12</td>\n",
+ " <td>0</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>45741</td>\n",
+ " <td>12</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>16718</td>\n",
+ " <td>12</td>\n",
+ " <td>0</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>70137</td>\n",
+ " <td>12</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " <td>1</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>\n",
+ " <div class=\"colab-df-buttons\">\n",
+ "\n",
+ " <div class=\"colab-df-container\">\n",
+ " <button class=\"colab-df-convert\"
onclick=\"convertToInteractive('df-8aa66ce8-dfee-43c9-bbea-f2b02e8a007e')\"\n",
+ " title=\"Convert this dataframe to an interactive
table.\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"
viewBox=\"0 -960 960 960\">\n",
+ " <path
d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220
220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440
0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+ " </svg>\n",
+ " </button>\n",
+ "\n",
+ " <style>\n",
+ " .colab-df-container {\n",
+ " display:flex;\n",
+ " gap: 12px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert {\n",
+ " background-color: #E8F0FE;\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: #1967D2;\n",
+ " height: 32px;\n",
+ " padding: 0 0 0 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert:hover {\n",
+ " background-color: #E2EBFA;\n",
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px
3px 1px rgba(60, 64, 67, 0.15);\n",
+ " fill: #174EA6;\n",
+ " }\n",
+ "\n",
+ " .colab-df-buttons div {\n",
+ " margin-bottom: 4px;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert {\n",
+ " background-color: #3B4455;\n",
+ " fill: #D2E3FC;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert:hover {\n",
+ " background-color: #434B5C;\n",
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+ " fill: #FFFFFF;\n",
+ " }\n",
+ " </style>\n",
+ "\n",
+ " <script>\n",
+ " const buttonEl =\n",
+ "
document.querySelector('#df-8aa66ce8-dfee-43c9-bbea-f2b02e8a007e
button.colab-df-convert');\n",
+ " buttonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ "\n",
+ " async function convertToInteractive(key) {\n",
+ " const element =
document.querySelector('#df-8aa66ce8-dfee-43c9-bbea-f2b02e8a007e');\n",
+ " const dataTable =\n",
+ " await
google.colab.kernel.invokeFunction('convertToInteractive',\n",
+ " [key],
{});\n",
+ " if (!dataTable) return;\n",
+ "\n",
+ " const docLinkHtml = 'Like what you see? Visit the '
+\n",
+ " '<a target=\"_blank\"
href=https://colab.research.google.com/notebooks/data_table.ipynb>data table
notebook</a>'\n",
+ " + ' to learn more about interactive tables.';\n",
+ " element.innerHTML = '';\n",
+ " dataTable['output_type'] = 'display_data';\n",
+ " await google.colab.output.renderOutput(dataTable,
element);\n",
+ " const docLink = document.createElement('div');\n",
+ " docLink.innerHTML = docLinkHtml;\n",
+ " element.appendChild(docLink);\n",
+ " }\n",
+ " </script>\n",
+ " </div>\n",
+ "\n",
+ "\n",
+ "<div id=\"df-cdc48cec-6814-456d-af79-c67163baa6ec\">\n",
+ " <button class=\"colab-df-quickchart\"
onclick=\"quickchart('df-cdc48cec-6814-456d-af79-c67163baa6ec')\"\n",
+ " title=\"Suggest charts\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ "<svg xmlns=\"http://www.w3.org/2000/svg\"
height=\"24px\"viewBox=\"0 0 24 24\"\n",
+ " width=\"24px\">\n",
+ " <g>\n",
+ " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2
2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4
0h-2v-4h2v4z\"/>\n",
+ " </g>\n",
+ "</svg>\n",
+ " </button>\n",
+ "\n",
+ "<style>\n",
+ " .colab-df-quickchart {\n",
+ " --bg-color: #E8F0FE;\n",
+ " --fill-color: #1967D2;\n",
+ " --hover-bg-color: #E2EBFA;\n",
+ " --hover-fill-color: #174EA6;\n",
+ " --disabled-fill-color: #AAA;\n",
+ " --disabled-bg-color: #DDD;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-quickchart {\n",
+ " --bg-color: #3B4455;\n",
+ " --fill-color: #D2E3FC;\n",
+ " --hover-bg-color: #434B5C;\n",
+ " --hover-fill-color: #FFFFFF;\n",
+ " --disabled-bg-color: #3B4455;\n",
+ " --disabled-fill-color: #666;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart {\n",
+ " background-color: var(--bg-color);\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: var(--fill-color);\n",
+ " height: 32px;\n",
+ " padding: 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart:hover {\n",
+ " background-color: var(--hover-bg-color);\n",
+ " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px
rgba(60, 64, 67, 0.15);\n",
+ " fill: var(--button-hover-fill-color);\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart-complete:disabled,\n",
+ " .colab-df-quickchart-complete:disabled:hover {\n",
+ " background-color: var(--disabled-bg-color);\n",
+ " fill: var(--disabled-fill-color);\n",
+ " box-shadow: none;\n",
+ " }\n",
+ "\n",
+ " .colab-df-spinner {\n",
+ " border: 2px solid var(--fill-color);\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " animation:\n",
+ " spin 1s steps(1) infinite;\n",
+ " }\n",
+ "\n",
+ " @keyframes spin {\n",
+ " 0% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " border-left-color: var(--fill-color);\n",
+ " }\n",
+ " 20% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 30% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 40% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 60% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 80% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " 90% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " }\n",
+ "</style>\n",
+ "\n",
+ " <script>\n",
+ " async function quickchart(key) {\n",
+ " const quickchartButtonEl =\n",
+ " document.querySelector('#' + key + ' button');\n",
+ " quickchartButtonEl.disabled = true; // To prevent
multiple clicks.\n",
+ " quickchartButtonEl.classList.add('colab-df-spinner');\n",
+ " try {\n",
+ " const charts = await
google.colab.kernel.invokeFunction(\n",
+ " 'suggestCharts', [key], {});\n",
+ " } catch (error) {\n",
+ " console.error('Error during call to suggestCharts:',
error);\n",
+ " }\n",
+ "
quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+ "
quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+ " }\n",
+ " (() => {\n",
+ " let quickchartButtonEl =\n",
+ "
document.querySelector('#df-cdc48cec-6814-456d-af79-c67163baa6ec button');\n",
+ " quickchartButtonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ " })();\n",
+ " </script>\n",
+ "</div>\n",
+ "\n",
+ " </div>\n",
+ " </div>\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "data",
+ "summary": "{\n \"name\": \"data\",\n \"rows\": 100000,\n
\"fields\": [\n {\n \"column\": \"user_id\",\n \"properties\": {\n
\"dtype\": \"string\",\n \"num_unique_values\": 100000,\n
\"samples\": [\n \"66192\",\n \"73109\",\n
\"49397\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\": \"age\",\n
\"properties\": {\n \"dtype\": \"Int64\",\n
\"num_unique_values\": 59,\n \"samples\": [\n \"12\",\n
\"17\",\n \"46\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"gender\",\n \"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 2,\n \"samples\": [\n \"1\",\n
\"0\"\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n },\n
{\n \"column\": \"state\",\n \"properties\": {\n \"dtype\":
\"category\",\n \"num_unique_values\": 231,\n \"samples\": [\n
\"218\",\n \"66\"\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n \"column\":
\"country\",\n \"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 15,\n \"samples\": [\n \"9\",\n
\"11\"\n ],\n \"semantic_type\": \"\",\n \"description\":
\"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Create a BigQuery dataset that will serve as the source for the
Vertex AI Feature Store."
+ ],
+ "metadata": {
+ "id": "Mm-HCUaa3ROZ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "dataset_id = \"vertexai_enrichment\"\n",
+ "dataset = bigquery.Dataset(f\"{PROJECT_ID}.{dataset_id}\")\n",
+ "dataset.location = \"US\"\n",
+ "dataset = client.create_dataset(\n",
+ " dataset, exists_ok=True, timeout=30\n",
+ ")\n",
+ "\n",
+ "print(\"Created dataset - %s.%s\" % (dataset, dataset_id))"
+ ],
+ "metadata": {
+ "id": "vye3UBGZ3Q8n",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "437597af-837d-483e-8c1e-ebbe0eca81e0"
+ },
+ "execution_count": 9,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Created dataset - Dataset(DatasetReference('google.com:clouddfe',
'vertexai_enrichment')).vertexai_enrichment\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Create BigQuery View with the precomputed feature values."
+ ],
+ "metadata": {
+ "id": "7lKiprPX4AZy"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "view_id = \"users_view\"\n",
+ "view_reference = \"%s.%s.%s\" % (PROJECT_ID, dataset_id, view_id)\n",
+ "view = bigquery.Table(view_reference)\n",
+ "view = client.load_table_from_dataframe(data, view_reference)"
+ ],
+ "metadata": {
+ "id": "xqaLPTxb4DDF"
+ },
+ "execution_count": 10,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Initialize clients for Vertex AI to create and set up an online
store."
+ ],
+ "metadata": {
+ "id": "eQLkSg3p7WAm"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "API_ENDPOINT = f\"{LOCATION}-aiplatform.googleapis.com\"\n",
+ "\n",
+ "admin_client = FeatureOnlineStoreAdminServiceClient(\n",
+ " client_options={\"api_endpoint\": API_ENDPOINT}\n",
+ ")\n",
+ "registry_client = FeatureRegistryServiceClient(\n",
+ " client_options={\"api_endpoint\": API_ENDPOINT}\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "GF_eIl-wVvRy"
+ },
+ "execution_count": 11,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Create online store instances on Vertex AI."
+ ],
+ "metadata": {
+ "id": "d9Mbk6m9Vgdo"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "feature_store_name = \"vertexai_enrichment\"\n",
+ "\n",
+ "online_store_config = feature_online_store_pb2.FeatureOnlineStore(\n",
+ " bigtable=feature_online_store_pb2.FeatureOnlineStore.Bigtable(\n",
+ "
auto_scaling=feature_online_store_pb2.FeatureOnlineStore.Bigtable.AutoScaling(\n",
+ " min_node_count=1, max_node_count=1,
cpu_utilization_target=80\n",
+ " )\n",
+ " )\n",
+ ")\n",
+ "\n",
+ "create_store_lro = admin_client.create_feature_online_store(\n",
+ "
feature_online_store_admin_service_pb2.CreateFeatureOnlineStoreRequest(\n",
+ " parent=f\"projects/{PROJECT_ID}/locations/{LOCATION}\",\n",
+ " feature_online_store_id=feature_store_name,\n",
+ " feature_online_store=online_store_config,\n",
+ " )\n",
+ ")\n",
+ "\n",
+ "create_store_lro.result()"
+ ],
+ "metadata": {
+ "id": "Zj-xEu_hWY7f",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "7f4ed1d9-c0c4-4c3c-f199-1e340d2cff11"
+ },
+ "execution_count": 12,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "name:
\"projects/927334603519/locations/us-central1/featureOnlineStores/vertexai_enrichment\""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 12
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "For the store instances created above, create feature views using
BigQuery as the data source."
+ ],
+ "metadata": {
+ "id": "DAHjWlqXXLU_"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "feature_view_name = \"users\"\n",
+ "\n",
+ "bigquery_source = feature_view_pb2.FeatureView.BigQuerySource(\n",
+ " uri=f\"bq://{view_reference}\",
entity_id_columns=[\"user_id\"]\n",
+ ")\n",
+ "\n",
+ "create_view_lro = admin_client.create_feature_view(\n",
+ "
feature_online_store_admin_service_pb2.CreateFeatureViewRequest(\n",
+ "
parent=f\"projects/{PROJECT_ID}/locations/{LOCATION}/featureOnlineStores/{feature_store_name}\",\n",
+ " feature_view_id=feature_view_name,\n",
+ " feature_view=feature_view_pb2.FeatureView(\n",
+ " big_query_source=bigquery_source,\n",
+ " ),\n",
+ " )\n",
+ ")\n",
+ "\n",
+ "create_view_lro.result()"
+ ],
+ "metadata": {
+ "id": "IhUERuRGXNaN",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "84facd77-5be4-4c99-90b5-d8ccb4c5d702"
+ },
+ "execution_count": 13,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "name:
\"projects/927334603519/locations/us-central1/featureOnlineStores/vertexai_enrichment/featureViews/users\""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Pull feature values into the feature store from BigQuery."
+ ],
+ "metadata": {
+ "id": "qbf4l8eBX6NG"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "sync_response = admin_client.sync_feature_view(\n",
+ "
feature_view=f\"projects/{PROJECT_ID}/locations/{LOCATION}/featureOnlineStores/{feature_store_name}/featureViews/{feature_view_name}\"\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "gdpsLCmMX7fX"
+ },
+ "execution_count": 14,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "while True:\n",
+ " feature_view_sync = admin_client.get_feature_view_sync(\n",
+ " name=sync_response.feature_view_sync\n",
+ " )\n",
+ " if feature_view_sync.run_time.end_time.seconds > 0:\n",
+ " if feature_view_sync.final_status.code == 0\n",
+ " print(\"feature view sync completed for %s\" %
feature_view_sync.name)\n",
+ " else:\n",
+ " print(\"feature view sync failed for %s\" %
feature_view_sync.name)\n",
+ " break\n",
+ " time.sleep(10)"
+ ],
+ "metadata": {
+ "id": "Lav6JTW4YKhR"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Check if the sync was created."
+ ],
+ "metadata": {
+ "id": "T3MMx7oJYPeC"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "admin_client.list_feature_view_syncs(\n",
+ "
parent=f\"projects/{PROJECT_ID}/locations/{LOCATION}/featureOnlineStores/{feature_store_name}/featureViews/{feature_view_name}\"\n",
+ ")"
+ ],
+ "metadata": {
+ "id": "ucSQRUfUYRFX",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "d2160812-9874-40bb-f464-f797eafb9999"
+ },
+ "execution_count": 16,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "ListFeatureViewSyncsPager<feature_view_syncs {\n",
+ " name:
\"projects/google.com:clouddfe/locations/us-central1/featureOnlineStores/vertexai_enrichment/featureViews/users/featureViewSyncs/6596305691974041600\"\n",
+ " create_time {\n",
+ " seconds: 1710904320\n",
+ " nanos: 539780000\n",
+ " }\n",
+ " final_status {\n",
+ " }\n",
+ " run_time {\n",
+ " start_time {\n",
+ " seconds: 1710904320\n",
+ " nanos: 539780000\n",
+ " }\n",
+ " end_time {\n",
+ " seconds: 1710904387\n",
+ " nanos: 621137000\n",
+ " }\n",
+ " }\n",
+ "}\n",
+ ">"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Publish messages to Pub/Sub\n",
+ "\n",
+ "Use the Pub/Sub python client to publish messages.\n"
+ ],
+ "metadata": {
+ "id": "pHODouJDwc60"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Replace <TOPIC_NAME> with the name of your Pub/Sub topic.\n",
+ "TOPIC = \"<TOPIC_NAME> \"\n",
+ "\n",
+ "# Replace <SUBSCRIPTION_NAME> with the subscription path for your
topic.\n",
+ "SUBSCRIPTION = \"<SUBSCRIPTION_NAME>\""
+ ],
+ "metadata": {
+ "id": "QKCuwDioxw-f"
+ },
+ "execution_count": 17,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Retrieve sample data from a public dataset in BigQuery and convert it
into Python dictionaries before sending it to Pub/Sub."
+ ],
+ "metadata": {
+ "id": "R0QYsOYFb_EU"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "read_query = \"\"\"\n",
+ "SELECT cast(user_id as string) AS user_id,\n",
+ " product_id,\n",
+ " sale_price,\n",
+ "FROM `bigquery-public-data.thelook_ecommerce.order_items`\n",
+ "LIMIT 5;\n",
+ "\"\"\"\n",
+ "\n",
+ "client = bigquery.Client(project=PROJECT_ID)\n",
+ "data = client.query(read_query).result().to_dataframe()\n",
+ "data.head()"
+ ],
+ "metadata": {
+ "id": "Kn7wmiKib-Wx",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 206
+ },
+ "outputId": "9680fbcc-dcb5-4158-90ae-69a9f3c776d0"
+ },
+ "execution_count": 18,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " user_id product_id sale_price\n",
+ "0 25005 14235 0.02\n",
+ "1 62544 14235 0.02\n",
+ "2 17228 14235 0.02\n",
+ "3 54015 14235 0.02\n",
+ "4 16569 14235 0.02"
+ ],
+ "text/html": [
+ "\n",
+ " <div id=\"df-c0bd3a49-0e2d-4cfe-97cb-25a78b4402db\"
class=\"colab-df-container\">\n",
+ " <div>\n",
+ "<style scoped>\n",
+ " .dataframe tbody tr th:only-of-type {\n",
+ " vertical-align: middle;\n",
+ " }\n",
+ "\n",
+ " .dataframe tbody tr th {\n",
+ " vertical-align: top;\n",
+ " }\n",
+ "\n",
+ " .dataframe thead th {\n",
+ " text-align: right;\n",
+ " }\n",
+ "</style>\n",
+ "<table border=\"1\" class=\"dataframe\">\n",
+ " <thead>\n",
+ " <tr style=\"text-align: right;\">\n",
+ " <th></th>\n",
+ " <th>user_id</th>\n",
+ " <th>product_id</th>\n",
+ " <th>sale_price</th>\n",
+ " </tr>\n",
+ " </thead>\n",
+ " <tbody>\n",
+ " <tr>\n",
+ " <th>0</th>\n",
+ " <td>25005</td>\n",
+ " <td>14235</td>\n",
+ " <td>0.02</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>1</th>\n",
+ " <td>62544</td>\n",
+ " <td>14235</td>\n",
+ " <td>0.02</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>2</th>\n",
+ " <td>17228</td>\n",
+ " <td>14235</td>\n",
+ " <td>0.02</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>3</th>\n",
+ " <td>54015</td>\n",
+ " <td>14235</td>\n",
+ " <td>0.02</td>\n",
+ " </tr>\n",
+ " <tr>\n",
+ " <th>4</th>\n",
+ " <td>16569</td>\n",
+ " <td>14235</td>\n",
+ " <td>0.02</td>\n",
+ " </tr>\n",
+ " </tbody>\n",
+ "</table>\n",
+ "</div>\n",
+ " <div class=\"colab-df-buttons\">\n",
+ "\n",
+ " <div class=\"colab-df-container\">\n",
+ " <button class=\"colab-df-convert\"
onclick=\"convertToInteractive('df-c0bd3a49-0e2d-4cfe-97cb-25a78b4402db')\"\n",
+ " title=\"Convert this dataframe to an interactive
table.\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"
viewBox=\"0 -960 960 960\">\n",
+ " <path
d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220
220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440
0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
+ " </svg>\n",
+ " </button>\n",
+ "\n",
+ " <style>\n",
+ " .colab-df-container {\n",
+ " display:flex;\n",
+ " gap: 12px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert {\n",
+ " background-color: #E8F0FE;\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: #1967D2;\n",
+ " height: 32px;\n",
+ " padding: 0 0 0 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-convert:hover {\n",
+ " background-color: #E2EBFA;\n",
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px
3px 1px rgba(60, 64, 67, 0.15);\n",
+ " fill: #174EA6;\n",
+ " }\n",
+ "\n",
+ " .colab-df-buttons div {\n",
+ " margin-bottom: 4px;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert {\n",
+ " background-color: #3B4455;\n",
+ " fill: #D2E3FC;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-convert:hover {\n",
+ " background-color: #434B5C;\n",
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
+ " fill: #FFFFFF;\n",
+ " }\n",
+ " </style>\n",
+ "\n",
+ " <script>\n",
+ " const buttonEl =\n",
+ "
document.querySelector('#df-c0bd3a49-0e2d-4cfe-97cb-25a78b4402db
button.colab-df-convert');\n",
+ " buttonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ "\n",
+ " async function convertToInteractive(key) {\n",
+ " const element =
document.querySelector('#df-c0bd3a49-0e2d-4cfe-97cb-25a78b4402db');\n",
+ " const dataTable =\n",
+ " await
google.colab.kernel.invokeFunction('convertToInteractive',\n",
+ " [key],
{});\n",
+ " if (!dataTable) return;\n",
+ "\n",
+ " const docLinkHtml = 'Like what you see? Visit the '
+\n",
+ " '<a target=\"_blank\"
href=https://colab.research.google.com/notebooks/data_table.ipynb>data table
notebook</a>'\n",
+ " + ' to learn more about interactive tables.';\n",
+ " element.innerHTML = '';\n",
+ " dataTable['output_type'] = 'display_data';\n",
+ " await google.colab.output.renderOutput(dataTable,
element);\n",
+ " const docLink = document.createElement('div');\n",
+ " docLink.innerHTML = docLinkHtml;\n",
+ " element.appendChild(docLink);\n",
+ " }\n",
+ " </script>\n",
+ " </div>\n",
+ "\n",
+ "\n",
+ "<div id=\"df-d22dfcc1-aa4a-402f-915a-84aa081a58a9\">\n",
+ " <button class=\"colab-df-quickchart\"
onclick=\"quickchart('df-d22dfcc1-aa4a-402f-915a-84aa081a58a9')\"\n",
+ " title=\"Suggest charts\"\n",
+ " style=\"display:none;\">\n",
+ "\n",
+ "<svg xmlns=\"http://www.w3.org/2000/svg\"
height=\"24px\"viewBox=\"0 0 24 24\"\n",
+ " width=\"24px\">\n",
+ " <g>\n",
+ " <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2
2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4
0h-2v-4h2v4z\"/>\n",
+ " </g>\n",
+ "</svg>\n",
+ " </button>\n",
+ "\n",
+ "<style>\n",
+ " .colab-df-quickchart {\n",
+ " --bg-color: #E8F0FE;\n",
+ " --fill-color: #1967D2;\n",
+ " --hover-bg-color: #E2EBFA;\n",
+ " --hover-fill-color: #174EA6;\n",
+ " --disabled-fill-color: #AAA;\n",
+ " --disabled-bg-color: #DDD;\n",
+ " }\n",
+ "\n",
+ " [theme=dark] .colab-df-quickchart {\n",
+ " --bg-color: #3B4455;\n",
+ " --fill-color: #D2E3FC;\n",
+ " --hover-bg-color: #434B5C;\n",
+ " --hover-fill-color: #FFFFFF;\n",
+ " --disabled-bg-color: #3B4455;\n",
+ " --disabled-fill-color: #666;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart {\n",
+ " background-color: var(--bg-color);\n",
+ " border: none;\n",
+ " border-radius: 50%;\n",
+ " cursor: pointer;\n",
+ " display: none;\n",
+ " fill: var(--fill-color);\n",
+ " height: 32px;\n",
+ " padding: 0;\n",
+ " width: 32px;\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart:hover {\n",
+ " background-color: var(--hover-bg-color);\n",
+ " box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px
rgba(60, 64, 67, 0.15);\n",
+ " fill: var(--button-hover-fill-color);\n",
+ " }\n",
+ "\n",
+ " .colab-df-quickchart-complete:disabled,\n",
+ " .colab-df-quickchart-complete:disabled:hover {\n",
+ " background-color: var(--disabled-bg-color);\n",
+ " fill: var(--disabled-fill-color);\n",
+ " box-shadow: none;\n",
+ " }\n",
+ "\n",
+ " .colab-df-spinner {\n",
+ " border: 2px solid var(--fill-color);\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " animation:\n",
+ " spin 1s steps(1) infinite;\n",
+ " }\n",
+ "\n",
+ " @keyframes spin {\n",
+ " 0% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " border-left-color: var(--fill-color);\n",
+ " }\n",
+ " 20% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 30% {\n",
+ " border-color: transparent;\n",
+ " border-left-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 40% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-top-color: var(--fill-color);\n",
+ " }\n",
+ " 60% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " }\n",
+ " 80% {\n",
+ " border-color: transparent;\n",
+ " border-right-color: var(--fill-color);\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " 90% {\n",
+ " border-color: transparent;\n",
+ " border-bottom-color: var(--fill-color);\n",
+ " }\n",
+ " }\n",
+ "</style>\n",
+ "\n",
+ " <script>\n",
+ " async function quickchart(key) {\n",
+ " const quickchartButtonEl =\n",
+ " document.querySelector('#' + key + ' button');\n",
+ " quickchartButtonEl.disabled = true; // To prevent
multiple clicks.\n",
+ " quickchartButtonEl.classList.add('colab-df-spinner');\n",
+ " try {\n",
+ " const charts = await
google.colab.kernel.invokeFunction(\n",
+ " 'suggestCharts', [key], {});\n",
+ " } catch (error) {\n",
+ " console.error('Error during call to suggestCharts:',
error);\n",
+ " }\n",
+ "
quickchartButtonEl.classList.remove('colab-df-spinner');\n",
+ "
quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
+ " }\n",
+ " (() => {\n",
+ " let quickchartButtonEl =\n",
+ "
document.querySelector('#df-d22dfcc1-aa4a-402f-915a-84aa081a58a9 button');\n",
+ " quickchartButtonEl.style.display =\n",
+ " google.colab.kernel.accessAllowed ? 'block' :
'none';\n",
+ " })();\n",
+ " </script>\n",
+ "</div>\n",
+ "\n",
+ " </div>\n",
+ " </div>\n"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "variable_name": "data",
+ "summary": "{\n \"name\": \"data\",\n \"rows\": 5,\n
\"fields\": [\n {\n \"column\": \"user_id\",\n \"properties\": {\n
\"dtype\": \"string\",\n \"num_unique_values\": 5,\n
\"samples\": [\n \"62544\",\n \"16569\",\n
\"17228\"\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"product_id\",\n \"properties\": {\n \"dtype\": \"Int64\",\n
\"num_unique_values\": 1,\n \"samples\": [\n \"14235\"\n
],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n
},\n {\n \"column\": \"sale_price\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 0.0,\n \"min\":
0.0199999995529651,\n \"max\": 0.0199999995529651,\n
\"num_unique_values\": 1,\n \"samples\": [\n
0.0199999995529651\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ }
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "messages = data.to_dict(orient='records')\n",
+ "\n",
+ "publisher = pubsub_v1.PublisherClient()\n",
+ "topic_name = publisher.topic_path(PROJECT_ID, TOPIC)\n",
+ "subscription_path = publisher.subscription_path(PROJECT_ID,
SUBSCRIPTION)\n",
+ "for message in messages:\n",
+ " data = json.dumps(message).encode('utf-8')\n",
+ " publish_future = publisher.publish(topic_name, data)"
+ ],
+ "metadata": {
+ "id": "MaCJwaPexPKZ"
+ },
+ "execution_count": 19,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Use the Vertex AI Feature Store enrichment handler\n",
+ "\n",
+ "The
[`VertexAIFeatureStoreEnrichmentHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.html#apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store.VertexAIFeatureStoreEnrichmentHandler)
is a built-in handler included in the Apache Beam SDK versions 2.55.0 and
later."
+ ],
+ "metadata": {
+ "id": "zPSFEMm02omi"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The `VertexAIFeatureStoreEnrichmentHandler` can be configured with
the following required parameters:\n",
+ "\n",
+ "* `project`: Google Cloud project-id of the feature store.\n",
+ "* `location`: Location of the feature store. Eg: `us-central1`.\n",
+ "* `api_endpoint`: Public endpoint of the feature store.\n",
+ "* `feature_store_name`: The name of the Vertex AI Feature Store.\n",
+ "* `feature_view_name`: The name of the feature view within the
Vertex AI Feature Store.\n",
+ "* `row_key`: The field name in the input row containing the
entity-id for the feature store. This is used to extract the entity-id from
each element and use it to fetch feature values for that specific element in
the enrichment transform.\n",
+ "\n",
+ "Optionally, `VertexAIFeatureStoreEnrichmentHandler` accepts a kwargs
to provide more configuration to connect with the Vertex AI client -
[`FeatureOnlineStoreServiceClient`](https://cloud.google.com/php/docs/reference/cloud-ai-platform/latest/V1.FeatureOnlineStoreServiceClient).\n",
+ "\n",
+ "**Note:** When exceptions occur, by default, the logging severity is
set to warning
([`ExceptionLevel.WARN`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.ExceptionLevel.WARN)).
To configure the severity to raise exceptions, set `exception_level` to
[`ExceptionLevel.RAISE`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.ExceptionLevel.RAISE).
To ignore exceptions, set `exception_level` to
[`ExceptionLevel.QUIET`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.ExceptionLevel.QUIET).\n",
+ "\n",
+ "The `VertexAIFeatureStoreEnrichmentHandler` returns the latest
feature values from the feature store."
+ ],
+ "metadata": {
+ "id": "K41xhvmA5yQk"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "row_key = 'user_id'"
+ ],
+ "metadata": {
+ "id": "3dB26jhI45gd"
+ },
+ "execution_count": 20,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "vertex_ai_handler =
VertexAIFeatureStoreEnrichmentHandler(project=PROJECT_ID,\n",
+ " location=LOCATION,\n",
+ " api_endpoint =
API_ENDPOINT,\n",
+ "
feature_store_name=feature_store_name,\n",
+ "
feature_view_name=feature_view_name,\n",
+ " row_key=row_key)"
+ ],
+ "metadata": {
+ "id": "cr1j_DHK4gA4"
+ },
+ "execution_count": 21,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Use the enrichment transform\n",
+ "\n",
+ "To use the [enrichment
transform](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.Enrichment),
the
[`EnrichmentHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.EnrichmentSourceHandler)
parameter is required. You can also use a configuration parameter to specify a
`lambda` for a join function, a timeout, a throttler, and a repeater (retry
strategy).\n",
+ "\n",
+ "\n",
+ "* `join_fn`: A lambda function that takes dictionaries as input and
returns an enriched row (`Callable[[Dict[str, Any], Dict[str, Any]],
beam.Row]`). The enriched row specifies how to join the data fetched from the
API. Defaults to a
[cross-join](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join).\n",
+ "* `timeout`: The number of seconds to wait for the request to be
completed by the API before timing out. Defaults to 30 seconds.\n",
+ "* `throttler`: Specifies the throttling mechanism. The only
supported option is default client-side adaptive throttling.\n",
+ "* `repeater`: Specifies the retry strategy when errors like
`TooManyRequests` and `TimeoutException` occur. Defaults to
[`ExponentialBackOffRepeater`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.requestresponse.html#apache_beam.io.requestresponse.ExponentialBackOffRepeater).\n",
+ "\n",
+ "\n",
+ "To utilize the Redis cache, apply the `with_redis_cache` hook to the
`Enrichment` transform. The coders for encoding/decoding the input and output
for the cache are optional and are internally inferred."
+ ],
+ "metadata": {
+ "id": "-Lvo8O2V-0Ey"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The following example demonstrates the code needed to add this
transform to your pipeline.\n",
+ "\n",
+ "\n",
+ "```\n",
+ "with beam.Pipeline() as p:\n",
+ " output = (p\n",
+ " ...\n",
+ " | \"Enrich with Vertex AI\" >>
Enrichment(vertex_ai_handler)\n",
+ " | \"RunInference\" >> RunInference(model_handler)\n",
+ " ...\n",
+ " )\n",
+ "```\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "xJTCfSmiV1kv"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "To make a prediction, use the following fields: `product_id`,
`quantity`, `price`, `customer_id`, and `customer_location`. Retrieve the value
of the `customer_location` field from Bigtable.\n",
+ "\n",
+ "The enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join)
by default."
+ ],
+ "metadata": {
+ "id": "F-xjiP_pHWZr"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Use the `VertexAIModelHandlerJSON` interface to run inference\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "CX9Cqybu6scV"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Since the enrichment transform outputs data in the format `beam.Row`,
in order to align it with the `VertexAIModelHandlerJSON` interface, it needs to
be converted into a list of `tensorflow.tensor`. Furthermore, certain enriched
fields may be of `string` type, but for `tensor` creation, all values should be
of the same type. Therefore, convert any `string` type fields to `int` type
before creating a tensor."
Review Comment:
```suggestion
"Because the enrichment transform outputs data in the format
`beam.Row`, to align it with the `VertexAIModelHandlerJSON` interface, convert
the out into a list of `tensorflow.tensor`. Some enriched fields are `string`
type. For tensor creation, all values must be the same type. Therefore, convert
any `string` type fields to `int` type fields before creating a tensor."
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]