damccorm commented on code in PR #33124:
URL: https://github.com/apache/beam/pull/33124#discussion_r1842936098


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java:
##########
@@ -76,40 +76,34 @@ public String description() {
         + "\n"
         + "Example configuration for performing a read using a SQL query: ::\n"
         + "\n"
-        + "    pipeline:\n"
-        + "      transforms:\n"
-        + "        - type: ReadFromSpanner\n"
-        + "          config:\n"
-        + "            instance_id: 'my-instance-id'\n"
-        + "            database_id: 'my-database'\n"
-        + "            query: 'SELECT * FROM table'\n"
+        + "    - type: ReadFromSpanner\n"
+        + "      config:\n"
+        + "        instance_id: 'my-instance-id'\n"
+        + "        database_id: 'my-database'\n"
+        + "        query: 'SELECT * FROM table'\n"

Review Comment:
   Is there a reason to pull these out of the pipeline definition? I don't 
really have an opinion one way or another, but it does crowd the PR a bit FWIW



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java:
##########
@@ -76,40 +76,34 @@ public String description() {
         + "\n"
         + "Example configuration for performing a read using a SQL query: ::\n"
         + "\n"
-        + "    pipeline:\n"
-        + "      transforms:\n"
-        + "        - type: ReadFromSpanner\n"
-        + "          config:\n"
-        + "            instance_id: 'my-instance-id'\n"
-        + "            database_id: 'my-database'\n"
-        + "            query: 'SELECT * FROM table'\n"
+        + "    - type: ReadFromSpanner\n"
+        + "      config:\n"
+        + "        instance_id: 'my-instance-id'\n"
+        + "        database_id: 'my-database'\n"
+        + "        query: 'SELECT * FROM table'\n"

Review Comment:
   (not asking for a revert here, but suggesting this moving forward)



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcReadSchemaTransformProvider.java:
##########
@@ -241,7 +368,7 @@ public abstract static class Builder {
 
       public abstract Builder setConnectionInitSql(List<String> value);
 
-      public abstract Builder setFetchSize(Short value);
+      public abstract Builder setFetchSize(Integer value);

Review Comment:
   Why did we change the typing here?



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java:
##########
@@ -76,40 +76,34 @@ public String description() {
         + "\n"
         + "Example configuration for performing a read using a SQL query: ::\n"
         + "\n"
-        + "    pipeline:\n"
-        + "      transforms:\n"
-        + "        - type: ReadFromSpanner\n"
-        + "          config:\n"
-        + "            instance_id: 'my-instance-id'\n"
-        + "            database_id: 'my-database'\n"
-        + "            query: 'SELECT * FROM table'\n"
+        + "    - type: ReadFromSpanner\n"
+        + "      config:\n"
+        + "        instance_id: 'my-instance-id'\n"
+        + "        database_id: 'my-database'\n"
+        + "        query: 'SELECT * FROM table'\n"

Review Comment:
   I think we should try to avoid PRs with changes that are independent of each 
other (2 PRs will generally be quicker/easier - go/small-cls), and this PR 
feels like it has a lot going on



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteSchemaTransformProvider.java:
##########
@@ -172,71 +312,55 @@ public PCollectionRowTuple expand(PCollectionRowTuple 
input) {
   @DefaultSchema(AutoValueSchema.class)
   public abstract static class JdbcWriteSchemaTransformConfiguration 
implements Serializable {
 
-    @Nullable
-    public abstract String getDriverClassName();

Review Comment:
   I also think the same is true of the validate change, again please correct 
me if that's wrong



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcReadSchemaTransformProvider.java:
##########
@@ -49,33 +50,174 @@ public class JdbcReadSchemaTransformProvider
     extends TypedSchemaTransformProvider<
         JdbcReadSchemaTransformProvider.JdbcReadSchemaTransformConfiguration> {
 
+  @Override
+  public @UnknownKeyFor @NonNull @Initialized String identifier() {
+    return "beam:schematransform:org.apache.beam:jdbc_read:v1";
+  }
+
+  @Override
+  public String description() {
+    return baseDescription("JDBC")
+        + "\n"
+        + "This transform can be used to read from a JDBC source using either 
a given JDBC driver jar "
+        + "and class name, or by using one of the default packaged drivers 
given a `jdbc_type`.\n"
+        + "\n"
+        + "#### Using a default driver\n"
+        + "\n"
+        + "This transform comes packaged with drivers for several popular JDBC 
distributions. The following "
+        + "distributions can be declared as the `jdbc_type`: "
+        + JDBC_DRIVER_MAP.keySet().toString().replaceAll("[\\[\\]]", "")
+        + ".\n"
+        + "\n"
+        + "For example, reading a MySQL source using a SQL query: ::"
+        + "\n"
+        + "    - type: ReadFromJdbc\n"
+        + "      config:\n"
+        + "        jdbc_type: mysql\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        query: \"SELECT * FROM table\"\n"
+        + "\n"
+        + "\n"
+        + "**Note**: See the following transforms which are built on top of 
this transform and simplify "
+        + "this logic for several popular JDBC distributions:\n\n"
+        + " - ReadFromMySql\n"
+        + " - ReadFromPostgres\n"
+        + " - ReadFromOracle\n"
+        + " - ReadFromSqlServer\n"
+        + "\n"
+        + "#### Declaring custom JDBC drivers\n"
+        + "\n"
+        + "If reading from a JDBC source not listed above, or if it is 
necessary to use a custom driver not "
+        + "packaged with Beam, one must define a JDBC driver and class name.\n"
+        + "\n"
+        + "For example, reading a MySQL source table: ::"
+        + "\n"
+        + "    - type: ReadFromJdbc\n"
+        + "      config:\n"
+        + "        driver_jars: \"path/to/some/jdbc.jar\"\n"
+        + "        driver_class_name: \"com.mysql.jdbc.Driver\"\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        table: \"my-table\"\n"
+        + "\n"
+        + "#### Connection Properties\n"
+        + "\n"
+        + "Connection properties are properties sent to the Driver used to 
connect to the JDBC source. For example, "
+        + "to set the character encoding to UTF-8, one could write: ::\n"
+        + "\n"
+        + "    - type: ReadFromJdbc\n"
+        + "      config:\n"
+        + "        connectionProperties: \"characterEncoding=UTF-8;\"\n"
+        + "        ...\n"
+        + "All properties should be semi-colon-delimited (e.g. 
\"key1=value1;key2=value2;\")\n";
+  }
+
+  protected String baseDescription(String jdbcType) {
+    return String.format(
+        "Read from a %s source using a SQL query or by directly accessing " + 
"a single table.\n",
+        jdbcType);
+  }
+
+  protected String inheritedDescription(

Review Comment:
   Can we combine `baseDescription` and `inheritedDescription` into a single 
function if we just don't use `baseDescription` in `description` above?



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteSchemaTransformProvider.java:
##########
@@ -172,71 +312,55 @@ public PCollectionRowTuple expand(PCollectionRowTuple 
input) {
   @DefaultSchema(AutoValueSchema.class)
   public abstract static class JdbcWriteSchemaTransformConfiguration 
implements Serializable {
 
-    @Nullable
-    public abstract String getDriverClassName();

Review Comment:
   Note that reordering params like this makes this PR significantly harder to 
review - this would also probably be more helpful as a separate PR if there is 
a reason to do this kind of reordering; I think the change is fine (though I 
don't really understand the need), but doing it as part of a large PR with 
functional changes makes it hard to tell what is cosmetic and what is functional



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteSchemaTransformProvider.java:
##########
@@ -54,33 +55,177 @@ public class JdbcWriteSchemaTransformProvider
     extends TypedSchemaTransformProvider<
         
JdbcWriteSchemaTransformProvider.JdbcWriteSchemaTransformConfiguration> {
 
+  @Override
+  public @UnknownKeyFor @NonNull @Initialized String identifier() {
+    return "beam:schematransform:org.apache.beam:jdbc_write:v1";
+  }
+
+  @Override
+  public String description() {
+    return baseDescription("JDBC")
+        + "\n"
+        + "This transform can be used to write to a JDBC sink using either a 
given JDBC driver jar "
+        + "and class name, or by using one of the default packaged drivers 
given a `jdbc_type`.\n"
+        + "\n"
+        + "#### Using a default driver\n"
+        + "\n"
+        + "This transform comes packaged with drivers for several popular JDBC 
distributions. The following "
+        + "distributions can be declared as the `jdbc_type`: "
+        + JDBC_DRIVER_MAP.keySet().toString().replaceAll("[\\[\\]]", "")
+        + ".\n"
+        + "\n"
+        + "For example, writing to a MySQL sink using a SQL query: ::"
+        + "\n"
+        + "    - type: WriteToJdbc\n"
+        + "      config:\n"
+        + "        jdbc_type: mysql\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        query: \"INSERT INTO table VALUES(?, ?)\"\n"
+        + "\n"
+        + "\n"
+        + "**Note**: See the following transforms which are built on top of 
this transform and simplify "
+        + "this logic for several popular JDBC distributions:\n\n"
+        + " - WriteToMySql\n"
+        + " - WriteToPostgres\n"
+        + " - WriteToOracle\n"
+        + " - WriteToSqlServer\n"
+        + "\n"
+        + "#### Declaring custom JDBC drivers\n"
+        + "\n"
+        + "If writing to a JDBC sink not listed above, or if it is necessary 
to use a custom driver not "
+        + "packaged with Beam, one must define a JDBC driver and class name.\n"
+        + "\n"
+        + "For example, writing to a MySQL table: ::"
+        + "\n"
+        + "    - type: WriteToJdbc\n"
+        + "      config:\n"
+        + "        driver_jars: \"path/to/some/jdbc.jar\"\n"
+        + "        driver_class_name: \"com.mysql.jdbc.Driver\"\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        table: \"my-table\"\n"
+        + "\n"
+        + "#### Connection Properties\n"
+        + "\n"
+        + "Connection properties are properties sent to the Driver used to 
connect to the JDBC source. For example, "
+        + "to set the character encoding to UTF-8, one could write: ::\n"
+        + "\n"
+        + "    - type: WriteToJdbc\n"
+        + "      config:\n"
+        + "        connectionProperties: \"characterEncoding=UTF-8;\"\n"
+        + "        ...\n"
+        + "All properties should be semi-colon-delimited (e.g. 
\"key1=value1;key2=value2;\")\n";
+  }
+
+  protected String baseDescription(String jdbcType) {
+    return String.format(
+        "Write to a %s sink using a SQL query or by directly accessing " + "a 
single table.\n",
+        jdbcType);
+  }
+
+  protected String inheritedDescription(
+      String prettyName, String transformName, String prefix, int port) {
+    return String.format(
+        "\n"
+            + "This is a special case of WriteToJdbc that includes the "
+            + "necessary %s Driver and classes.\n"
+            + "\n"
+            + "An example of using %s with SQL query: ::\n"
+            + "\n"
+            + "    - type: %s\n"
+            + "      config:\n"
+            + "        url: \"jdbc:%s://my-host:%d/database\"\n"
+            + "        query: \"INSERT INTO table VALUES(?, ?)\"\n"
+            + "\n"
+            + "It is also possible to read a table by specifying a table name. 
For example, the "
+            + "following configuration will perform a read on an entire table: 
::\n"
+            + "\n"
+            + "    - type: %s\n"
+            + "      config:\n"
+            + "        url: \"jdbc:%s://my-host:%d/database\"\n"
+            + "        table: \"my-table\"\n"
+            + "\n"
+            + "#### Advanced Usage\n"
+            + "\n"
+            + "It might be necessary to use a custom JDBC driver that is not 
packaged with this "
+            + "transform. If that is the case, see WriteToJdbc which "
+            + "allows for more custom configuration.",
+        prettyName, transformName, transformName, prefix, port, transformName, 
prefix, port);
+  }
+
   @Override
   protected @UnknownKeyFor @NonNull @Initialized 
Class<JdbcWriteSchemaTransformConfiguration>
       configurationClass() {
     return JdbcWriteSchemaTransformConfiguration.class;
   }
 
+  protected static void validateConfig(

Review Comment:
   Trying to follow this change - is there a reason to pull this out into a 
separate static method vs calling `config.validate()`?



##########
sdks/python/apache_beam/yaml/generate_yaml_docs.py:
##########
@@ -196,12 +196,28 @@ def io_grouping_key(transform_name):
 ]
 
 
+def add_transform_links(transform, description, provider_list):
+  """
+  Convert references of Providers to urls that link to their respective pages.
+  Avoid self-linking within a Transform page.
+  """
+  for p in provider_list:
+    description = re.sub(
+        rf"(?<!type: )\b(?!{transform}\b)\b{p}\b",
+        f'<a href="#{p.lower()}">{p}</a>',

Review Comment:
   I'm having a hard time following this regex, could you add a comment 
breaking down what it is doing please?



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteSchemaTransformProvider.java:
##########
@@ -172,71 +312,55 @@ public PCollectionRowTuple expand(PCollectionRowTuple 
input) {
   @DefaultSchema(AutoValueSchema.class)
   public abstract static class JdbcWriteSchemaTransformConfiguration 
implements Serializable {
 
-    @Nullable
-    public abstract String getDriverClassName();

Review Comment:
   I think all the parameters changed here are just cosmetic, please let me 
know if that is wrong



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcReadSchemaTransformProvider.java:
##########
@@ -49,33 +50,174 @@ public class JdbcReadSchemaTransformProvider
     extends TypedSchemaTransformProvider<
         JdbcReadSchemaTransformProvider.JdbcReadSchemaTransformConfiguration> {
 
+  @Override
+  public @UnknownKeyFor @NonNull @Initialized String identifier() {
+    return "beam:schematransform:org.apache.beam:jdbc_read:v1";
+  }
+
+  @Override
+  public String description() {
+    return baseDescription("JDBC")
+        + "\n"
+        + "This transform can be used to read from a JDBC source using either 
a given JDBC driver jar "
+        + "and class name, or by using one of the default packaged drivers 
given a `jdbc_type`.\n"
+        + "\n"
+        + "#### Using a default driver\n"
+        + "\n"
+        + "This transform comes packaged with drivers for several popular JDBC 
distributions. The following "
+        + "distributions can be declared as the `jdbc_type`: "
+        + JDBC_DRIVER_MAP.keySet().toString().replaceAll("[\\[\\]]", "")
+        + ".\n"
+        + "\n"
+        + "For example, reading a MySQL source using a SQL query: ::"
+        + "\n"
+        + "    - type: ReadFromJdbc\n"
+        + "      config:\n"
+        + "        jdbc_type: mysql\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        query: \"SELECT * FROM table\"\n"
+        + "\n"
+        + "\n"
+        + "**Note**: See the following transforms which are built on top of 
this transform and simplify "
+        + "this logic for several popular JDBC distributions:\n\n"
+        + " - ReadFromMySql\n"
+        + " - ReadFromPostgres\n"
+        + " - ReadFromOracle\n"
+        + " - ReadFromSqlServer\n"
+        + "\n"
+        + "#### Declaring custom JDBC drivers\n"
+        + "\n"
+        + "If reading from a JDBC source not listed above, or if it is 
necessary to use a custom driver not "
+        + "packaged with Beam, one must define a JDBC driver and class name.\n"
+        + "\n"
+        + "For example, reading a MySQL source table: ::"
+        + "\n"
+        + "    - type: ReadFromJdbc\n"
+        + "      config:\n"
+        + "        driver_jars: \"path/to/some/jdbc.jar\"\n"
+        + "        driver_class_name: \"com.mysql.jdbc.Driver\"\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        table: \"my-table\"\n"
+        + "\n"
+        + "#### Connection Properties\n"
+        + "\n"
+        + "Connection properties are properties sent to the Driver used to 
connect to the JDBC source. For example, "
+        + "to set the character encoding to UTF-8, one could write: ::\n"
+        + "\n"
+        + "    - type: ReadFromJdbc\n"
+        + "      config:\n"
+        + "        connectionProperties: \"characterEncoding=UTF-8;\"\n"
+        + "        ...\n"
+        + "All properties should be semi-colon-delimited (e.g. 
\"key1=value1;key2=value2;\")\n";
+  }
+
+  protected String baseDescription(String jdbcType) {
+    return String.format(
+        "Read from a %s source using a SQL query or by directly accessing " + 
"a single table.\n",
+        jdbcType);
+  }
+
+  protected String inheritedDescription(
+      String prettyName, String transformName, String prefix, int port) {

Review Comment:
   Nit: `prefix` (and to a lesser extent `port`) is not very descriptive here 
(you need to look at the contained string to understand what it is used for)



##########
sdks/java/io/jdbc/src/main/java/org/apache/beam/sdk/io/jdbc/JdbcWriteSchemaTransformProvider.java:
##########
@@ -54,33 +55,177 @@ public class JdbcWriteSchemaTransformProvider
     extends TypedSchemaTransformProvider<
         
JdbcWriteSchemaTransformProvider.JdbcWriteSchemaTransformConfiguration> {
 
+  @Override
+  public @UnknownKeyFor @NonNull @Initialized String identifier() {
+    return "beam:schematransform:org.apache.beam:jdbc_write:v1";
+  }
+
+  @Override
+  public String description() {
+    return baseDescription("JDBC")
+        + "\n"
+        + "This transform can be used to write to a JDBC sink using either a 
given JDBC driver jar "
+        + "and class name, or by using one of the default packaged drivers 
given a `jdbc_type`.\n"
+        + "\n"
+        + "#### Using a default driver\n"
+        + "\n"
+        + "This transform comes packaged with drivers for several popular JDBC 
distributions. The following "
+        + "distributions can be declared as the `jdbc_type`: "
+        + JDBC_DRIVER_MAP.keySet().toString().replaceAll("[\\[\\]]", "")
+        + ".\n"
+        + "\n"
+        + "For example, writing to a MySQL sink using a SQL query: ::"
+        + "\n"
+        + "    - type: WriteToJdbc\n"
+        + "      config:\n"
+        + "        jdbc_type: mysql\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        query: \"INSERT INTO table VALUES(?, ?)\"\n"
+        + "\n"
+        + "\n"
+        + "**Note**: See the following transforms which are built on top of 
this transform and simplify "
+        + "this logic for several popular JDBC distributions:\n\n"
+        + " - WriteToMySql\n"
+        + " - WriteToPostgres\n"
+        + " - WriteToOracle\n"
+        + " - WriteToSqlServer\n"
+        + "\n"
+        + "#### Declaring custom JDBC drivers\n"
+        + "\n"
+        + "If writing to a JDBC sink not listed above, or if it is necessary 
to use a custom driver not "
+        + "packaged with Beam, one must define a JDBC driver and class name.\n"
+        + "\n"
+        + "For example, writing to a MySQL table: ::"
+        + "\n"
+        + "    - type: WriteToJdbc\n"
+        + "      config:\n"
+        + "        driver_jars: \"path/to/some/jdbc.jar\"\n"
+        + "        driver_class_name: \"com.mysql.jdbc.Driver\"\n"
+        + "        url: \"jdbc:mysql://my-host:3306/database\"\n"
+        + "        table: \"my-table\"\n"
+        + "\n"
+        + "#### Connection Properties\n"
+        + "\n"
+        + "Connection properties are properties sent to the Driver used to 
connect to the JDBC source. For example, "
+        + "to set the character encoding to UTF-8, one could write: ::\n"
+        + "\n"
+        + "    - type: WriteToJdbc\n"
+        + "      config:\n"
+        + "        connectionProperties: \"characterEncoding=UTF-8;\"\n"
+        + "        ...\n"
+        + "All properties should be semi-colon-delimited (e.g. 
\"key1=value1;key2=value2;\")\n";
+  }
+
+  protected String baseDescription(String jdbcType) {
+    return String.format(
+        "Write to a %s sink using a SQL query or by directly accessing " + "a 
single table.\n",
+        jdbcType);
+  }
+
+  protected String inheritedDescription(
+      String prettyName, String transformName, String prefix, int port) {
+    return String.format(
+        "\n"
+            + "This is a special case of WriteToJdbc that includes the "
+            + "necessary %s Driver and classes.\n"
+            + "\n"
+            + "An example of using %s with SQL query: ::\n"
+            + "\n"
+            + "    - type: %s\n"
+            + "      config:\n"
+            + "        url: \"jdbc:%s://my-host:%d/database\"\n"
+            + "        query: \"INSERT INTO table VALUES(?, ?)\"\n"
+            + "\n"
+            + "It is also possible to read a table by specifying a table name. 
For example, the "
+            + "following configuration will perform a read on an entire table: 
::\n"
+            + "\n"
+            + "    - type: %s\n"
+            + "      config:\n"
+            + "        url: \"jdbc:%s://my-host:%d/database\"\n"
+            + "        table: \"my-table\"\n"
+            + "\n"
+            + "#### Advanced Usage\n"
+            + "\n"
+            + "It might be necessary to use a custom JDBC driver that is not 
packaged with this "
+            + "transform. If that is the case, see WriteToJdbc which "
+            + "allows for more custom configuration.",
+        prettyName, transformName, transformName, prefix, port, transformName, 
prefix, port);
+  }
+
   @Override
   protected @UnknownKeyFor @NonNull @Initialized 
Class<JdbcWriteSchemaTransformConfiguration>
       configurationClass() {
     return JdbcWriteSchemaTransformConfiguration.class;
   }
 
+  protected static void validateConfig(

Review Comment:
   I think I'd actually prefer we leave this unless there's a reason to change 
it since it is technically public



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to