rszper commented on code in PR #30783:
URL: https://github.com/apache/beam/pull/30783#discussion_r1543389740
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, you can use parameters to further configure the
`BigTableEnrichmentHandler` handler. For more information about the available
parameters, see [enrichment handler module
documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
Review Comment:
```suggestion
"Optionally, you can use parameters to further configure the
`BigTableEnrichmentHandler` handler. For more information about the available
parameters, see the [enrichment handler module
documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -624,6 +642,44 @@
" return beam.Row(**enriched)"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "To provide a `lambda` function for using a custom join with the
enrichment transform, see the following example.\n",
+ "\n",
+ "```\n",
+ "with beam.Pipeline() as p:\n",
+ " output = (p\n",
+ " ...\n",
+ " | \"Enrich with BigTable\" >>
Enrichment(bigtable_handler, join_fn=custom_join)\n",
+ " | \"RunInference\" >> RunInference(model_handler)\n",
+ " ...\n",
+ " )\n",
+ "```"
+ ],
+ "metadata": {
+ "id": "fe3bIclV1jZ5"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Because the enrichment transform make API calls to the remote
service, use the `timeout` parameter to specify a timeout duration of 10
seconds:\n",
Review Comment:
```suggestion
"Because the enrichment transform makes API calls to the remote
service, use the `timeout` parameter to specify a timeout duration of 10
seconds:\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -601,9 +617,11 @@
"id": "F-xjiP_pHWZr"
},
"source": [
- "To make a prediction, use the following fields: `product_id`,
`quantity`, `price`, `customer_id`, and `customer_location`. Retrieve the value
of the `customer_location` field from Bigtable.\n",
+ "The enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join)
by default. This will return the enriched row with the following fields:
`sale_id`, `customer_id`, `product_id`, `quantity`, `price`, and
`customer_location`.\n",
Review Comment:
```suggestion
"By default, the enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join).
This join returns the enriched row with the following fields: `sale_id`,
`customer_id`, `product_id`, `quantity`, `price`, and `customer_location`.\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -601,9 +617,11 @@
"id": "F-xjiP_pHWZr"
},
"source": [
- "To make a prediction, use the following fields: `product_id`,
`quantity`, `price`, `customer_id`, and `customer_location`. Retrieve the value
of the `customer_location` field from Bigtable.\n",
+ "The enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join)
by default. This will return the enriched row with the following fields:
`sale_id`, `customer_id`, `product_id`, `quantity`, `price`, and
`customer_location`.\n",
+ "\n",
+ "But for the ecommerce use case, to make a prediction, the trained
model needs the following fields: `product_id`, `quantity`, `price`,
`customer_id`, and `customer_location`.\n",
Review Comment:
```suggestion
"To make a prediction when running the ecommerce example, however,
the trained model needs the following fields: `product_id`, `quantity`,
`price`, `customer_id`, and `customer_location`.\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -601,9 +617,11 @@
"id": "F-xjiP_pHWZr"
},
"source": [
- "To make a prediction, use the following fields: `product_id`,
`quantity`, `price`, `customer_id`, and `customer_location`. Retrieve the value
of the `customer_location` field from Bigtable.\n",
+ "The enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join)
by default. This will return the enriched row with the following fields:
`sale_id`, `customer_id`, `product_id`, `quantity`, `price`, and
`customer_location`.\n",
+ "\n",
+ "But for the ecommerce use case, to make a prediction, the trained
model needs the following fields: `product_id`, `quantity`, `price`,
`customer_id`, and `customer_location`.\n",
"\n",
- "Because the enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join)
by default, design the custom join to enrich the input data. This design
ensures that the join includes only the specified fields."
+ "Design a custom join function that takes two dictionaries as input
and returns an enriched row that include these fields."
Review Comment:
```suggestion
"Therefore, to get the required fields for the ecommerce example,
design a custom join function that takes two dictionaries as input and returns
an enriched row that include these fields."
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]