rszper commented on code in PR #30783:
URL: https://github.com/apache/beam/pull/30783#discussion_r1543219708
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, to provide additional configuration to the
`BigTableEnrichmentHandler` handler, see [module
docs](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
Review Comment:
```suggestion
"Optionally, you can use parameters to further configure the
`BigTableEnrichmentHandler` handler. For more information about the available
parameters, see [enrichment handler module
documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, to provide additional configuration to the
`BigTableEnrichmentHandler` handler, see [module
docs](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yFMcaf8i7TbI"
+ },
+ "source": [
+ "**Note:** When exceptions occur, by default, the logging severity is
set to warning
([`ExceptionLevel.WARN`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.WARN)).
To configure the severity to raise exceptions, set `exception_level` to
[`ExceptionLevel.RAISE`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.RAISE).
To ignore exceptions, set `exception_level` to
[`ExceptionLevel.QUIET`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.QUIET).\n",
"\n",
- "The default `encoding` type is `utf-8`.\n",
+ "The following example demonstrates how to set exception level in
`BigTableEnrichmentHandler`:\n",
Review Comment:
```suggestion
"The following example demonstrates how to set the exception level
in the `BigTableEnrichmentHandler` handler:\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, to provide additional configuration to the
`BigTableEnrichmentHandler` handler, see [module
docs](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yFMcaf8i7TbI"
+ },
+ "source": [
+ "**Note:** When exceptions occur, by default, the logging severity is
set to warning
([`ExceptionLevel.WARN`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.WARN)).
To configure the severity to raise exceptions, set `exception_level` to
[`ExceptionLevel.RAISE`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.RAISE).
To ignore exceptions, set `exception_level` to
[`ExceptionLevel.QUIET`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.QUIET).\n",
"\n",
- "The default `encoding` type is `utf-8`.\n",
+ "The following example demonstrates how to set exception level in
`BigTableEnrichmentHandler`:\n",
"\n",
- "\n"
+ "```\n",
+ "bigtable_handler =
BigTableEnrichmentHandler(project_id=PROJECT_ID,\n",
+ "
instance_id=INSTANCE_ID,\n",
+ " table_id=TABLE_ID,\n",
+ " row_key=row_key,\n",
+ "
exception_level=ExceptionLevel.RAISE)\n",
+ "```"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The key component of the Bigtable handler is the `row_key` parameter.
The `row_key` represent the field in input schema (`beam.Row`) that contains
the row key for a row in the table.\n",
Review Comment:
```suggestion
"The `row_key` parameter represents the field in input schema
(`beam.Row`) that contains the row key for a row in the table.\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -624,6 +640,44 @@
" return beam.Row(**enriched)"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The following example demonstrates the code needed to provide a
lambda function for a custom join to the enrichment transform:\n",
+ "\n",
+ "```\n",
+ "with beam.Pipeline() as p:\n",
+ " output = (p\n",
+ " ...\n",
+ " | \"Enrich with BigTable\" >>
Enrichment(bigtable_handler, join_fn=custom_join)\n",
+ " | \"RunInference\" >> RunInference(model_handler)\n",
+ " ...\n",
+ " )\n",
+ "```"
+ ],
+ "metadata": {
+ "id": "fe3bIclV1jZ5"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Since the enrichment transform make API calls to the remote service,
specify a timeout duration of 10 seconds using the timeout param:\n",
Review Comment:
```suggestion
"Because the enrichment transform make API calls to the remote
service, use the `timeout` parameter to specify a timeout duration of 10
seconds:\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, to provide additional configuration to the
`BigTableEnrichmentHandler` handler, see [module
docs](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yFMcaf8i7TbI"
+ },
+ "source": [
+ "**Note:** When exceptions occur, by default, the logging severity is
set to warning
([`ExceptionLevel.WARN`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.WARN)).
To configure the severity to raise exceptions, set `exception_level` to
[`ExceptionLevel.RAISE`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.RAISE).
To ignore exceptions, set `exception_level` to
[`ExceptionLevel.QUIET`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.QUIET).\n",
"\n",
- "The default `encoding` type is `utf-8`.\n",
+ "The following example demonstrates how to set exception level in
`BigTableEnrichmentHandler`:\n",
"\n",
- "\n"
+ "```\n",
+ "bigtable_handler =
BigTableEnrichmentHandler(project_id=PROJECT_ID,\n",
+ "
instance_id=INSTANCE_ID,\n",
+ " table_id=TABLE_ID,\n",
+ " row_key=row_key,\n",
+ "
exception_level=ExceptionLevel.RAISE)\n",
+ "```"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The key component of the Bigtable handler is the `row_key` parameter.
The `row_key` represent the field in input schema (`beam.Row`) that contains
the row key for a row in the table.\n",
+ "\n",
+ "As of Apache Beam version 2.54.0, if the table uses composite row
keys, then you can:\n",
+ "* modify the input schema to contain the row key in the format
required by Bigtable.\n",
+ "* use a custom enrichment handler ([example handler with composite
row key
support](https://gist.github.com/riteshghorse/21f4480c1c545ea01166e6bdf4c183e1))."
Review Comment:
```suggestion
"* Use a custom enrichment handler. For more information, see the
[example handler with composite row key
support](https://gist.github.com/riteshghorse/21f4480c1c545ea01166e6bdf4c183e1)."
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, to provide additional configuration to the
`BigTableEnrichmentHandler` handler, see [module
docs](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yFMcaf8i7TbI"
+ },
+ "source": [
+ "**Note:** When exceptions occur, by default, the logging severity is
set to warning
([`ExceptionLevel.WARN`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.WARN)).
To configure the severity to raise exceptions, set `exception_level` to
[`ExceptionLevel.RAISE`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.RAISE).
To ignore exceptions, set `exception_level` to
[`ExceptionLevel.QUIET`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.QUIET).\n",
"\n",
- "The default `encoding` type is `utf-8`.\n",
+ "The following example demonstrates how to set exception level in
`BigTableEnrichmentHandler`:\n",
"\n",
- "\n"
+ "```\n",
+ "bigtable_handler =
BigTableEnrichmentHandler(project_id=PROJECT_ID,\n",
+ "
instance_id=INSTANCE_ID,\n",
+ " table_id=TABLE_ID,\n",
+ " row_key=row_key,\n",
+ "
exception_level=ExceptionLevel.RAISE)\n",
+ "```"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The key component of the Bigtable handler is the `row_key` parameter.
The `row_key` represent the field in input schema (`beam.Row`) that contains
the row key for a row in the table.\n",
+ "\n",
+ "As of Apache Beam version 2.54.0, if the table uses composite row
keys, then you can:\n",
Review Comment:
```suggestion
"Starting in Apache Beam version 2.54.0, if the table uses composite
row keys, then you can do the following tasks:\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -562,13 +584,7 @@
"source": [
"## Use the enrichment transform\n",
"\n",
- "To use the [enrichment
transform](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.Enrichment),
the
[`EnrichmentHandler`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.EnrichmentSourceHandler)
parameter is required. You can also use a configuration parameter to specify a
`lambda` for a join function, a timeout, a throttler, and a repeater (retry
strategy).\n",
- "\n",
- "\n",
- "* `join_fn`: A lambda function that takes dictionaries as input and
returns an enriched row (`Callable[[Dict[str, Any], Dict[str, Any]],
beam.Row]`). The enriched row specifies how to join the data fetched from the
API. Defaults to a
[cross-join](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join).\n",
- "* `timeout`: The number of seconds to wait for the request to be
completed by the API before timing out. Defaults to 30 seconds.\n",
- "* `throttler`: Specifies the throttling mechanism. The only
supported option is default client-side adaptive throttling.\n",
- "* `repeater`: Specifies the retry strategy when errors like
`TooManyRequests` and `TimeoutException` occur. Defaults to
[`ExponentialBackOffRepeater`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.requestresponse.html#apache_beam.io.requestresponse.ExponentialBackOffRepeater).\n"
+ "To use the [enrichment
transform](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.Enrichment),
only the enrichment handler parameter is required."
Review Comment:
```suggestion
"To use the [enrichment
transform](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.Enrichment),
the enrichment handler parameter is the only required parameter."
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, to provide additional configuration to the
`BigTableEnrichmentHandler` handler, see [module
docs](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yFMcaf8i7TbI"
+ },
+ "source": [
+ "**Note:** When exceptions occur, by default, the logging severity is
set to warning
([`ExceptionLevel.WARN`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.WARN)).
To configure the severity to raise exceptions, set `exception_level` to
[`ExceptionLevel.RAISE`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.RAISE).
To ignore exceptions, set `exception_level` to
[`ExceptionLevel.QUIET`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.QUIET).\n",
"\n",
- "The default `encoding` type is `utf-8`.\n",
+ "The following example demonstrates how to set exception level in
`BigTableEnrichmentHandler`:\n",
"\n",
- "\n"
+ "```\n",
+ "bigtable_handler =
BigTableEnrichmentHandler(project_id=PROJECT_ID,\n",
+ "
instance_id=INSTANCE_ID,\n",
+ " table_id=TABLE_ID,\n",
+ " row_key=row_key,\n",
+ "
exception_level=ExceptionLevel.RAISE)\n",
+ "```"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The key component of the Bigtable handler is the `row_key` parameter.
The `row_key` represent the field in input schema (`beam.Row`) that contains
the row key for a row in the table.\n",
+ "\n",
+ "As of Apache Beam version 2.54.0, if the table uses composite row
keys, then you can:\n",
+ "* modify the input schema to contain the row key in the format
required by Bigtable.\n",
Review Comment:
```suggestion
"* Modify the input schema to contain the row key in the format
required by Bigtable.\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -624,6 +640,44 @@
" return beam.Row(**enriched)"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The following example demonstrates the code needed to provide a
lambda function for a custom join to the enrichment transform:\n",
Review Comment:
```suggestion
"To provide a `lambda` function for using a custom join with the
enrichment transform, see the following example.\n",
```
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -509,15 +509,48 @@
"id": "K41xhvmA5yQk"
},
"source": [
- "To establish a client for the Bigtable enrichment handler, replace
`<PROJECT_ID>`, `<INSTANCE_ID>`, and `<TABLE_ID>` with the appropriate values
for those fields. The `row_key` variable is the field name from the input row
that contains the row key to use when querying Bigtable.\n",
+ "Configure the `BigTableEnrichmentHandler` handler with the following
required parameters:\n",
+ "\n",
+ "* `project_id`: the Google Cloud project ID for the Bigtable
instance\n",
+ "* `instance_id`: the instance name of the Bigtable cluster\n",
+ "* `table_id`: the table ID of table containing relevant data\n",
+ "* `row_key`: The field name from the input row that contains the row
key to use when querying Bigtable.\n",
"\n",
- "To convert a `string` type to a `byte` type or a `byte` type to a
`string` type from Bigtable, you can configure additional options, such as
[`app_profile_id`](https://cloud.google.com/bigtable/docs/app-profiles),
[`row_filter`](https://cloud.google.com/python/docs/reference/bigtable/latest/row-filters),
and
[`encoding`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#apache_beam.transforms.enrichment_handlers.bigtable.BigTableEnrichmentHandler:~:text=for%20more%20details.-,encoding,-(str)%20%E2%80%93%20encoding)
type.\n",
+ "Optionally, to provide additional configuration to the
`BigTableEnrichmentHandler` handler, see [module
docs](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.bigtable.html#module-apache_beam.transforms.enrichment_handlers.bigtable)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yFMcaf8i7TbI"
+ },
+ "source": [
+ "**Note:** When exceptions occur, by default, the logging severity is
set to warning
([`ExceptionLevel.WARN`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.WARN)).
To configure the severity to raise exceptions, set `exception_level` to
[`ExceptionLevel.RAISE`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.RAISE).
To ignore exceptions, set `exception_level` to
[`ExceptionLevel.QUIET`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment_handlers.utils.html#apache_beam.transforms.enrichment_handlers.utils.ExceptionLevel.QUIET).\n",
"\n",
- "The default `encoding` type is `utf-8`.\n",
+ "The following example demonstrates how to set exception level in
`BigTableEnrichmentHandler`:\n",
"\n",
- "\n"
+ "```\n",
+ "bigtable_handler =
BigTableEnrichmentHandler(project_id=PROJECT_ID,\n",
+ "
instance_id=INSTANCE_ID,\n",
+ " table_id=TABLE_ID,\n",
+ " row_key=row_key,\n",
+ "
exception_level=ExceptionLevel.RAISE)\n",
+ "```"
]
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The key component of the Bigtable handler is the `row_key` parameter.
The `row_key` represent the field in input schema (`beam.Row`) that contains
the row key for a row in the table.\n",
+ "\n",
+ "As of Apache Beam version 2.54.0, if the table uses composite row
keys, then you can:\n",
+ "* modify the input schema to contain the row key in the format
required by Bigtable.\n",
+ "* use a custom enrichment handler ([example handler with composite
row key
support](https://gist.github.com/riteshghorse/21f4480c1c545ea01166e6bdf4c183e1))."
Review Comment:
Should this link be pointing to your repo?
##########
examples/notebooks/beam-ml/bigtable_enrichment_transform.ipynb:
##########
@@ -601,9 +617,9 @@
"id": "F-xjiP_pHWZr"
},
"source": [
- "To make a prediction, use the following fields: `product_id`,
`quantity`, `price`, `customer_id`, and `customer_location`. Retrieve the value
of the `customer_location` field from Bigtable.\n",
+ "The enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join)
by default. To override this behavior, the transform accepts a `join_fn`
lambda function. The lambda function takes two dictionaries as input and
returns an enriched row.\n",
"\n",
- "Because the enrichment transform performs a
[`cross_join`](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.enrichment.html#apache_beam.transforms.enrichment.cross_join)
by default, design the custom join to enrich the input data. This design
ensures that the join includes only the specified fields."
+ "For our ecommerce use case, to make a prediction, it needs the
following fields: `product_id`, `quantity`, `price`, `customer_id`, and
`customer_location`. Design the custom join to enrich the input data such that
the enriched row has these fields."
Review Comment:
```suggestion
"For the ecommerce use case, to make a prediction, provide the
following fields: `product_id`, `quantity`, `price`, `customer_id`, and
`customer_location`. When you design the custom join to enrich the input data,
the enriched row must include these fields."
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]