riteshghorse commented on code in PR #30315:
URL: https://github.com/apache/beam/pull/30315#discussion_r1489958064
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -0,0 +1,854 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "id": "fFjof1NgAJwu",
+ "cellView": "form"
+ },
+ "outputs": [],
+ "source": [
+ "# @title ###### Licensed to the Apache Software Foundation (ASF),
Version 2.0 (the \"License\")\n",
+ "\n",
+ "# Licensed to the Apache Software Foundation (ASF) under one\n",
+ "# or more contributor license agreements. See the NOTICE file\n",
+ "# distributed with this work for additional information\n",
+ "# regarding copyright ownership. The ASF licenses this file\n",
+ "# to you under the Apache License, Version 2.0 (the\n",
+ "# \"License\"); you may not use this file except in compliance\n",
+ "# with the License. You may obtain a copy of the License at\n",
+ "#\n",
+ "# http://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing,\n",
+ "# software distributed under the License is distributed on an\n",
+ "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ "# KIND, either express or implied. See the License for the\n",
+ "# specific language governing permissions and limitations\n",
+ "# under the License"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A8xNRyZMW1yK"
+ },
+ "source": [
+ "# Use Apache Beam and Bigtable to enrich data\n",
+ "\n",
+ "<table align=\"left\">\n",
+ " <td>\n",
+ " <a target=\"_blank\"
href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb\"><img
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\"
/>Run in Google Colab</a>\n",
+ " </td>\n",
+ " <td>\n",
+ " <a target=\"_blank\"
href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb\"><img
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\"
/>View source on GitHub</a>\n",
+ " </td>\n",
+ "</table>\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HrCtxslBGK8Z"
+ },
+ "source": [
+ "This notebook shows how to use the Apache Beam [enrichment
transform](https://beam.apache.org/releases/pydoc/2.54.0/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.Enrichment)
with [Bigtable](https://cloud.google.com/bigtable) to enrich data. The
enrichment transform is a turnkey transform in Apache Beam that lets you enrich
data by using a key-value lookup. This transform has the following features:\n",
+ "\n",
+ "- The transform has a built-in Apache Beam handler that interacts
with Bigtable to get data to use in the enrichment.\n",
+ "- The enrichment transform uses client-side throttling to manage
rate-limiting the requests. The requests are exponentially backed off with a
default retry strategy. You can configure rate-limiting to suit your use case."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "This notebook demonstrates the following ecommerce use case:\n",
+ "\n",
+ "A stream of online transaction from
[Pub/Sub](https://cloud.google.com/pubsub/docs/guides) contains the following
fields: `sale_id`, `product_id`, `customer_id`, `quantity`, and `price`.
Additional customer demographic data is stored in a separate Bigtable cluster.
The demographic data is used to enrich the event stream from Pub/Sub. Then, the
enriched data is used to predict the next product to recommended to a customer."
+ ],
+ "metadata": {
+ "id": "ltn5zrBiGS9C"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gVCtGOKTHMm4"
+ },
+ "source": [
+ "## Before you begin\n",
+ "Set up your environment and download dependencies."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YDHPlMjZRuY0"
+ },
+ "source": [
+ "### Install Apache Beam\n",
+ "To use the enrichment transform with the built-in Bigtable handler,
install the Apache Beam SDK version 2.54.0 or later."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "jBakpNZnAhqk"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install torch\n",
+ "!pip install apache_beam[interactive,gcp]==2.54.0 --quiet"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import datetime\n",
+ "import json\n",
+ "import math\n",
+ "\n",
+ "from typing import Any\n",
+ "from typing import Dict\n",
+ "\n",
+ "import torch\n",
+ "from google.cloud import pubsub_v1\n",
+ "from google.cloud.bigtable import Client\n",
+ "from google.cloud.bigtable import column_family\n",
+ "\n",
+ "import apache_beam as beam\n",
+ "import apache_beam.runners.interactive.interactive_beam as ib\n",
+ "from apache_beam.ml.inference.base import RunInference\n",
+ "from apache_beam.ml.inference.pytorch_inference import
PytorchModelHandlerTensor\n",
+ "from apache_beam.options import pipeline_options\n",
+ "from apache_beam.runners.interactive.interactive_runner import
InteractiveRunner\n",
+ "from apache_beam.transforms.enrichment import Enrichment\n",
+ "from apache_beam.transforms.enrichment_handlers.bigtable import
BigTableEnrichmentHandler"
+ ],
+ "metadata": {
+ "id": "SiJii48A2Rnb"
+ },
+ "execution_count": 3,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "X80jy3FqHjK4"
+ },
+ "source": [
+ "### Authenticate with Google Cloud\n",
+ "This notebook reads data from Pub/Sub and Bigtable. To use your
Google Cloud account, authenticate this notebook."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "id": "Kz9sccyGBqz3"
+ },
+ "outputs": [],
+ "source": [
+ "from google.colab import auth\n",
+ "auth.authenticate_user()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Provide values for the `PROJECT_ID`, `INSTANCE_ID`, and `TABLE_ID`
fields to use with Bigtable."
Review Comment:
this is correct, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]