gemini-code-assist[bot] commented on code in PR #38527:
URL: https://github.com/apache/beam/pull/38527#discussion_r3260669271
##########
website/www/site/content/en/documentation/io/managed-io.md:
##########
@@ -418,228 +421,228 @@ and Beam SQL is invoked via the Managed API under the
hood.
<code style="color: green">str</code>
</td>
<td>
- A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. |
Format: host1:port1,host2:port2,...
+ A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. This
list should be in the form `host1:port1,host2:port2,...`
Review Comment:

The description format for `bootstrap_servers` here is inconsistent with the
one used in the `KAFKA Write` section (line 590). It is better to use a
consistent format across the documentation.
```suggestion
A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. |
Format: host1:port1,host2:port2,...
```
##########
website/www/site/content/en/documentation/io/managed-io.md:
##########
@@ -418,228 +421,228 @@ and Beam SQL is invoked via the Managed API under the
hood.
<code style="color: green">str</code>
</td>
<td>
- A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. |
Format: host1:port1,host2:port2,...
+ A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. This
list should be in the form `host1:port1,host2:port2,...`
</td>
</tr>
<tr>
<td>
- <strong>format</strong>
+ <strong>topic</strong>
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- The encoding format for the data stored in Kafka. Valid options are:
RAW,JSON,AVRO,PROTO
+ n/a
Review Comment:

The description for the required field `topic` is "n/a". This should be
updated with a meaningful description.
```suggestion
The Kafka topic to read from.
```
##########
website/www/site/content/en/documentation/io/managed-io.md:
##########
@@ -809,6 +823,19 @@ and Beam SQL is invoked via the Managed API under the hood.
For a streaming pipeline, sets the limit for lifting bundles into the
direct write path.
</td>
</tr>
+ <tr>
+ <td>
+ distribution_mode
+ </td>
+ <td>
+ <code style="color: green">str</code>
+ </td>
+ <td>
+ Defines distribution of write data. Supported distributions:
+- none: don't shuffle rows (default)
+- hash: shuffle rows by partition key before writing data
Review Comment:

The markdown list inside the HTML `<td>` tag might not render correctly
because it lacks proper indentation and line breaks. For consistency with other
entries in this file, consider using `<br>` for line breaks and ensuring the
list items are indented.
```suggestion
Defines distribution of write data. Supported distributions:<br>
- none: don't shuffle rows (default)<br>
- hash: shuffle rows by partition key before writing data
```
##########
website/www/site/content/en/documentation/io/managed-io.md:
##########
@@ -864,6 +891,18 @@ and Beam SQL is invoked via the Managed API under the hood.
For more information on partition transforms, please visit
https://iceberg.apache.org/spec/#partition-transforms.
</td>
</tr>
+ <tr>
+ <td>
+ sort_fields
+ </td>
+ <td>
+ <code>list[<span style="color: green;">str</span>]</code>
+ </td>
+ <td>
+ Fields used to set the table's sort order, applied when the table is
created. Each entry has the form `<term> [asc|desc] [nulls first|nulls last]`,
where `<term>` is a field name or one of the partition transforms (e.g.
`bucket(col, 4)`, `day(ts)`). Direction defaults to ascending; null order
defaults to nulls-first for ascending and nulls-last for descending. Note: this
sets the table's declared sort order as metadata; it does not cause Beam to
physically sort records before writing.
+For more information on sort orders, please visit
https://iceberg.apache.org/spec/#sort-orders.
Review Comment:

There is a raw newline between the description and the link. This can cause
rendering issues within an HTML table cell in some markdown processors.
Consider using a `<br>` and maintaining consistent indentation.
```suggestion
Fields used to set the table's sort order, applied when the table is
created. Each entry has the form `<term> [asc|desc] [nulls first|nulls last]`,
where `<term>` is a field name or one of the partition transforms (e.g.
`bucket(col, 4)`, `day(ts)`). Direction defaults to ascending; null order
defaults to nulls-first for ascending and nulls-last for descending. Note: this
sets the table's declared sort order as metadata; it does not cause Beam to
physically sort records before writing.<br>
For more information on sort orders, please visit
https://iceberg.apache.org/spec/#sort-orders.
```
##########
website/www/site/content/en/documentation/io/managed-io.md:
##########
@@ -1076,17 +1104,6 @@ For more information on table properties, please visit
https://iceberg.apache.or
n/a
Review Comment:

The description for the required field `jdbc_url` is "n/a". This should be
updated with a meaningful description.
```suggestion
Connection URL for the JDBC sink.
```
##########
website/www/site/content/en/documentation/io/managed-io.md:
##########
@@ -650,7 +653,7 @@ and Beam SQL is invoked via the Managed API under the hood.
<code style="color: green">str</code>
</td>
<td>
- The schema in which the data is encoded in the Kafka topic. For AVRO
data, this is a schema defined with AVRO schema syntax
(https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is
a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL
to Confluent Schema Registry is provided, then this field is ignored, and the
schema is fetched from Confluent Schema Registry.
+ n/a
Review Comment:

The description for `schema` has been replaced with "n/a", which is a
regression from the previous version. It should be restored to provide
necessary context for users.
```suggestion
The schema in which the data is encoded in the Kafka topic. For AVRO
data, this is a schema defined with AVRO schema syntax
(https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is
a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL
to Confluent Schema Registry is provided, then this field is ignored, and the
schema is fetched from Confluent Schema Registry.
```
##########
website/www/site/content/en/documentation/io/managed-io.md:
##########
@@ -418,228 +421,228 @@ and Beam SQL is invoked via the Managed API under the
hood.
<code style="color: green">str</code>
</td>
<td>
- A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. |
Format: host1:port1,host2:port2,...
+ A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. This
list should be in the form `host1:port1,host2:port2,...`
</td>
</tr>
<tr>
<td>
- <strong>format</strong>
+ <strong>topic</strong>
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- The encoding format for the data stored in Kafka. Valid options are:
RAW,JSON,AVRO,PROTO
+ n/a
</td>
</tr>
<tr>
<td>
- <strong>topic</strong>
+ allow_duplicates
</td>
<td>
- <code style="color: green">str</code>
+ <code style="color: orange">boolean</code>
</td>
<td>
- n/a
+ If the Kafka read allows duplicates.
</td>
</tr>
<tr>
<td>
- file_descriptor_path
+ confluent_schema_registry_subject
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- The path to the Protocol Buffer File Descriptor Set file. This file is
used for schema definition and message serialization.
+ n/a
</td>
</tr>
<tr>
<td>
- message_name
+ confluent_schema_registry_url
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- The name of the Protocol Buffer message to be used for schema
extraction and data conversion.
+ n/a
</td>
</tr>
<tr>
<td>
- producer_config_updates
+ consumer_config_updates
</td>
<td>
<code>map[<span style="color: green;">str</span>, <span style="color:
green;">str</span>]</code>
</td>
<td>
- A list of key-value pairs that act as configuration parameters for
Kafka producers. Most of these configurations will not be needed, but if you
need to customize your Kafka producer, you may use this. See a detailed list:
https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html
+ A list of key-value pairs that act as configuration parameters for
Kafka consumers. Most of these configurations will not be needed, but if you
need to customize your Kafka consumer, you may use this. See a detailed list:
https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html
</td>
</tr>
<tr>
<td>
- schema
+ file_descriptor_path
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- n/a
+ The path to the Protocol Buffer File Descriptor Set file. This file is
used for schema definition and message serialization.
</td>
</tr>
- </table>
-</div>
-
-### `KAFKA` Read
-
-<div class="table-container-wrapper">
- <table class="table table-bordered">
- <tr>
- <th>Configuration</th>
- <th>Type</th>
- <th>Description</th>
- </tr>
<tr>
<td>
- <strong>bootstrap_servers</strong>
+ format
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. This
list should be in the form `host1:port1,host2:port2,...`
+ The encoding format for the data stored in Kafka. Valid options are:
RAW,STRING,AVRO,JSON,PROTO
</td>
</tr>
<tr>
<td>
- <strong>topic</strong>
+ message_name
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- n/a
+ The name of the Protocol Buffer message to be used for schema
extraction and data conversion.
</td>
</tr>
<tr>
<td>
- allow_duplicates
+ offset_deduplication
</td>
<td>
<code style="color: orange">boolean</code>
</td>
<td>
- If the Kafka read allows duplicates.
+ If the redistribute is using offset deduplication mode.
</td>
</tr>
<tr>
<td>
- confluent_schema_registry_subject
+ redistribute_by_record_key
</td>
<td>
- <code style="color: green">str</code>
+ <code style="color: orange">boolean</code>
</td>
<td>
- n/a
+ If the redistribute keys by the Kafka record key.
</td>
</tr>
<tr>
<td>
- confluent_schema_registry_url
+ redistribute_num_keys
</td>
<td>
- <code style="color: green">str</code>
+ <code style="color: #f54251">int32</code>
</td>
<td>
- n/a
+ The number of keys for redistributing Kafka inputs.
</td>
</tr>
<tr>
<td>
- consumer_config_updates
+ redistributed
</td>
<td>
- <code>map[<span style="color: green;">str</span>, <span style="color:
green;">str</span>]</code>
+ <code style="color: orange">boolean</code>
</td>
<td>
- A list of key-value pairs that act as configuration parameters for
Kafka consumers. Most of these configurations will not be needed, but if you
need to customize your Kafka consumer, you may use this. See a detailed list:
https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html
+ If the Kafka read should be redistributed.
</td>
</tr>
<tr>
<td>
- file_descriptor_path
+ schema
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- The path to the Protocol Buffer File Descriptor Set file. This file is
used for schema definition and message serialization.
+ The schema in which the data is encoded in the Kafka topic. For AVRO
data, this is a schema defined with AVRO schema syntax
(https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is
a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL
to Confluent Schema Registry is provided, then this field is ignored, and the
schema is fetched from Confluent Schema Registry.
</td>
</tr>
+ </table>
+</div>
+
+### `KAFKA` Write
+
+<div class="table-container-wrapper">
+ <table class="table table-bordered">
+ <tr>
+ <th>Configuration</th>
+ <th>Type</th>
+ <th>Description</th>
+ </tr>
<tr>
<td>
- format
+ <strong>bootstrap_servers</strong>
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- The encoding format for the data stored in Kafka. Valid options are:
RAW,STRING,AVRO,JSON,PROTO
+ A list of host/port pairs to use for establishing the initial
connection to the Kafka cluster. The client will make use of all servers
irrespective of which servers are specified here for bootstrapping—this list
only impacts the initial hosts used to discover the full set of servers. |
Format: host1:port1,host2:port2,...
</td>
</tr>
<tr>
<td>
- message_name
+ <strong>format</strong>
</td>
<td>
<code style="color: green">str</code>
</td>
<td>
- The name of the Protocol Buffer message to be used for schema
extraction and data conversion.
+ The encoding format for the data stored in Kafka. Valid options are:
RAW,JSON,AVRO,PROTO
</td>
</tr>
<tr>
<td>
- offset_deduplication
+ <strong>topic</strong>
</td>
<td>
- <code style="color: orange">boolean</code>
+ <code style="color: green">str</code>
</td>
<td>
- If the redistribute is using offset deduplication mode.
+ n/a
Review Comment:

The description for the required field `topic` is "n/a". This should be
updated with a meaningful description.
```suggestion
The Kafka topic to write to.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]