[GitHub] [beam] hermanmak commented on a diff in pull request #24962: Add I/O Standards Page

GitBox Thu, 19 Jan 2023 00:53:04 -0800


hermanmak commented on code in PR #24962:
URL: https://github.com/apache/beam/pull/24962#discussion_r1080964125



##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 
1P/3P developers developing an Apache Beam I/O connector. These guidelines aim 
to create best practices encompassing documentation, development and testing in 
a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as 
a **Built-in I/O connector**. Built-in I/O’s have their [integration 
tests](#integration-tests) and performance tests routinely run by the Google 
Cloud Dataflow Team using the Dataflow Runner and metrics published publicly 
for [reference](#dashboard). Otherwise, the following guidelines will apply to 
both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be 
made available with an I/O. The Apache Beam documentation referenced throughout 
this section can be found [here](https://beam.apache.org/documentation/). And 
generally a good example to follow would be the built-in I/O, [Snowflake 
I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This 
should also have links to any external sources of information within the Apache 
Beam site or external documentation location.
+         <p>Examples:
+         <ul>
+            <li><a 
href="https://beam.apache.org/releases/javadoc/current/overview-summary.html";>Java
 doc</a>
+            <li><a 
href="https://beam.apache.org/releases/pydoc/current/";>Python doc</a>
+            <li><a 
href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam";>Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that 
covers specific tips and configurations. The following shows those for <a 
href="https://beam.apache.org/documentation/io/built-in/parquet/";>Parquet</a>, 
<a href="https://beam.apache.org/documentation/io/built-in/hadoop/";>Hadoop</a> 
and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" 
width="" alt="I/O connector guides screenshot" title="I/O connector guides 
screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc should 
be consistent throughout such that programmatic information extraction for 
other pages can be enabled in the future.
+         <p>Example <strong>subset</strong> of sections to include in your 
page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>
+            <li>Limitations
+            <li>Reporting an Issue
+            </li>
+         </ol>
+         <p>Example:
+         <p>The KafkaIO <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html";>JavaDoc</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>I/O Connectors should include a note indicating <a 
href="https://2022.beamsummit.org/sessions/relational-beam/";>Relational 
Features</a> supported in their page under <strong>I/O connector 
guides</strong>.
+         <p>Relational Features are concepts that can help improve efficiency 
and can optionally be implemented by an I/O Connector. Using end user supplied 
pipeline configuration (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/SchemaIO.html";>SchemaIO</a>)
 and user query (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/FieldAccessDescriptor.html";>FieldAccessDescriptor</a>)
 data, relational theory is applied to derive improvements such as faster 
pipeline execution, lower operation costs and less data read/written.
+         <p>Example table:
+         <p><img 
src="/images/io-standards/io-supported-relational-features-table.png" width="" 
alt="Supported Relational Features" title="Supported Relational Features"></img>
+         <p>Example implementations:
+         <p>BigQueryIO <a 
href="https://github.com/apache/beam/blob/5bb13fa35b9bc36764895c57f23d3890f0f1b567/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1813";>Column
 Pruning</a> via ProjectionPushdown to return only necessary columns indicated 
by an end user's query. This is achieved using BigQuery DirectRead API.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a page under <strong>Common pipeline patterns</strong>, if 
necessary, outlining common usage patterns involving your I/O.
+         <p><a 
href="https://beam.apache.org/documentation/patterns/bigqueryio/";>https://beam.apache.org/documentation/patterns/bigqueryio/</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Update <strong>I/O Connectors</strong> with your I/O’s information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors";>https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors</a>
+         <p><img src="/images/io-standards/io-supported-via-screenshot.png" 
width="" alt="alt_text" title="image_tooltip">
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Provide setup steps to use the I/O, under a <strong>Before you 
start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start";>https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Include a canonical read/write code snippet after the initial 
description for each supported language. The below example shows Hadoop with 
examples for Java.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformatio";>https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformation</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps for elements are assigned. This includes 
batch sources to allow for future I/Os which may provide more useful 
information than current_time().
+         <p>Example:
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps are advanced; for Batch sources this will 
be marked as n/a in most cases.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Outline any temporary resources (for example, files) that the 
connector will create.
+         <p>Example:
+         <p>BigQuery batch loads first create a temp GCS location
+         <p><a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455";>https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Provide, under an <strong>Authentication</strong> subheader, how 
to acquire partner authorization material to securely access the source/sink.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#authentication";>https://beam.apache.org/documentation/io/built-in/snowflake/#authentication</a>
+         <p>Here BigQuery names it permissions but the topic covers 
similarities
+         <p><a 
href="https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html";>https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>I/Os should provide links to the Source/Sink documentation within 
<strong>Before you start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/";>https://beam.apache.org/documentation/io/built-in/snowflake/</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Indicate if there is native or X-language support in each language 
with a link to the docs.
+         <p>Example:
+         <p>Kinesis I/O has a native implementation of java and X-language 
support for python but no support for Golang.
+      </td>
+  </tr>
+  <tr>
+   <td>
+      <p>Indicate known limitations under a <strong>Limitations</strong> 
header. If the limitation has a tracking issue, please link it inline.
+      <p>Example:
+      <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#limitations";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+   </td>
+  </tr>
+</table>
+</div>
+
+
+
+### I/O (not built-in)
+
+Custom I/Os are not included in the Apache Beam Github repository. Some 
examples would be 
[Solace](https://github.com/SolaceProducts/solace-apache-beam)IO.
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-connectors">
+   <tr>
+      <td>
+         <p>Update I/O connectors with your I/O information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+## Development
+
+This section outlines API Syntax, Semantics and Feature Adoption 
recommendations for new and existing Apache Beam I/O Connectors.
+
+Development guidelines are written with the following principles in mind:
+
+
+
+* Consistency makes an API easier to learn
+    * If there are multiple ways of doing something, we should strive to be 
consistent first
+* With a couple minutes of studying documentation, users should be able to 
pick up most I/O connectors
+* The design of a new I/O should consider the possibility of evolution
+* Transforms should integrate well with other Beam utilities
+
+
+### All SDKs
+
+
+#### Pipeline Configuration / Execution / Streaming / Windowing semantics 
guidelines
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <th>
+         <p>Topic
+      </th>
+      <th>
+         <p>Semantics
+      </th>
+   </tr>
+   <tr>
+      <td>
+         <p>Pipeline Options
+      </td>
+      <td>
+         <p>An I/O should rarely rely on a PipelineOptions subclass to tune 
internal parameters.
+         <p>A connector-related pipeline options class should:
+         <ul>
+            <li>Document clearly, for each option, the effect it has and why 
one may modify it.
+            <li>Option names must be namespaced to avoid collisions
+            <li>Class Name: {Connector}Options
+            <li>Method names: .set{Connector}{Option}, get{Connector}{Option}
+            </li>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Source Windowing
+      </td>
+      <td>
+         <p>A source must return elements in the GlobalWindow unless 
explicitly parameterized in the API by the user.
+         <p>Allowable Non-global-window patterns:
+         <ul>
+            <li>ReadFromIO(window_by=...)
+            <li>ReadFromIO.IntoFixedWindows(...)
+            <li>ReadFromIO(apply_windowing=True/False) (e.g. <a 
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.periodicsequence.html#apache_beam.transforms.periodicsequence.PeriodicImpulse";>PeriodicImpulse</a>)
+            <li>IO.read().withWindowing(...)
+            <li>IO.read().windowBy(...)
+            <li>IO.read().withFixedWindows(...)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Sink Windowing
+      </td>
+      <td>
+         <p>A sink should be Window agnostic and handle elements sent with any 
Windowing methodexpect elements to be sent to it in the Global Window, unless 
explicitly parameterized or expressed in its API.
+         <p>A sink may change the windowing of a PCollection internally 
however it needs, however, the metadata that it returns as part of its Result 
object must be:
+         <ul>
+            <li>In the same window, unless explicitly declared in the API
+            <li>With accurate timestamps
+            <li><strong>It may</strong> also return metadata with information 
about windowing (e.g. a BigQuery job may have a timestamp, but also a window 
associated with it).
+         </ul>
+         <p>Allowable non-global-window patterns:
+         <ul>
+            <li>WriteToIO(triggering_frequency=...) - e.g. <a 
href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html#apache_beam.io.gcp.bigquery.WriteToBigQuery";>WriteToBigQuery</a>
 (This only sets the windowing within the transform - input data is still in 
the Global Window).
+            <li>WriteBatchesToIO(...)
+            <li>WriteWindowsToIO(...)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Throttling
+      </td>
+      <td>
+         <p>A streaming sink (or any transform accessing an external service) 
may implement throttling of its requests to prevent from overloading the 
external service.
+         <p>TODO: Beam should expose throttling utilities (<a 
href="https://github.com/apache/beam/issues/24743";>Tracking Issue</a>):
+         <ul>
+            <li>Per-key fixed throttling
+            <li>Adaptive throttling with sink-reported backpressure
+            <li>Ramp-up throttling from a start point
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Error handling
+      </td>
+      <td>
+         <p>TODO: <a 
href="https://github.com/apache/beam/issues/24742";>Tracking Issue</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+
+
+### Java
+
+
+#### General
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>The primary class used in working with the connector should be 
named <strong>{connector}IO</strong>
+         <p>Example:
+         <p>The BigQuery I/O is 
<strong>org.apache.beam.sdk.io.bigquery.BigQueryIO</strong>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>The class should be placed in the package 
<strong>org.apache.beam.sdk.io.{connector}</strong>
+         <p>Example:
+         <p>The BigQueryIO belongs in the java package <a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java";>org.apache.beam.sdk.io.bigquery</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>The unit/integration/performance tests should live under the 
package <strong>org.apache.beam.sdk.io.{connector}.testing</strong>. This will 
cause the various tests to work with the standard user-facing interfaces of the 
connector.
+         <p>Unit tests should reside in the same package (i.e. 
<strong>org.apache.beam.sdk.io.{connector}</strong>), as they may often test 
internals of the connector.
+         <p>The BigQueryIO belongs in the java package <a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java";>org.apache.beam.sdk.io.bigquery</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>An I/O transform should avoid receiving user lambdas to map 
elements from a user type to a connector-specific type. Instead, they should 
interface with a connector-specific data type (with schema information when 
possible).
+         <p>When necessary, then an I/O transform should receive a type 
parameter that specifies the input type (for sinks) or output type (for 
sources) of the transform.
+         <p>An I/O transform may not have a type parameter <strong>only if it 
is certain that its output type will not change</strong> (e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.MatchAll.html";>FileIO.MatchAll</a>
 and other <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html";>FileIO
 transforms</a>).
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>As part of the API of an I/O, it is highly discouraged to directly 
expose third-party libraries in the public API of a Beam API or connector.
+         <ul>
+            <li>It reduces Apache Beam’s compatibility guarantees - Changes to 
third-party libraries can/will directly break existing user’s pipelines.
+            <li>It makes code maintainability hard - If libraries are directly 
exposed at API level, a dependency change will require multiple changes 
throughout the I/O implementation code
+            <li>It forces third-party dependencies onto end users
+            </li>
+         </ul>
+         <p>Instead, we highly recommend exposing Beam-native interfaces and 
an adaptor be implemented to translate.
+         <p>If you believe that the library in question is extremely static in 
nature. Please note it in the I/O itself.
+         <p>As part of the API of an I/O, it is <strong>highly 
discouraged</strong> to expose third-party libraries in the public API of a 
Beam API or connector,. Instead, a Beam-native interface should be used and 
adapted into the third-library object.
+         <p>The reasonings are because:
+         <ul>
+            <li>Third-party libraries will make dependency upgrades difficult, 
because of Beam’s backwards compatibility guarantees
+            <li>Exposing third-party libraries may force dependencies onto 
Beam users
+            </li>
+         </ul>
+         <p>This requirement has the shortcoming that Beam will simply mirror 
external library API objects, but it will allow us to upgrade external 
libraries if needed, and let users deal with versions as well.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Source and Sinks should be abstracted with a PTransform wrapper, 
and internal classes be declared protected or private. By doing so 
implementation details can be added/changed/modified without breaking 
implementation by dependencies.
+      </td>
+   </tr>
+</table>
+</div>
+
+
+#### Classes / Methods / Properties
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <th>
+         <p>Java Syntax
+      </th>
+      <th>
+         <p>Semantics
+      </th>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Read
+      </td>
+      <td>
+         <p>Gives access to the class that represents reads within the I/O. 
The Read class should implement an  fluent interface similar to the 
fluentbuilder pattern (e.g. withX(...).withY(...)). Together with default 
values; it provides a fail-fast (with immediate validation feedback after each 
.withX() and slightly less verbosity compared to builder pattern.
+         <p>A user should <strong>not</strong> create this class directly. It 
should be created by a <a href="#bookmark=id.kafer4mjzh1m">top-level utility 
method</a>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.ReadAll
+      </td>
+      <td>
+         <p>A few different sources implement runtime configuration for 
reading from a data source. This is a valuable pattern because it enables a 
purely batch source to become a more sophisticated streaming source.
+         <p>As much as possible, this type of transform should have the type 
richness of a construction-time-configured transform:
+         <ul>
+            <li>Support Beam Row output with a schema known at 
construction-time
+            <li>Extra configuration may be needed (and acceptable) in this 
case (e.g. a SchemaProvider parameter, a Schema parameter, a Schema Catalog or 
a utility of that sort).
+            <li>The input PCollection should have a fixed type with a schema, 
so it can be easily manipulated by users.
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.ReadAll.html";>JdbcIO.ReadAll</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.ReadFiles.html";>ParquetIO.ReadFiles</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Write
+      </td>
+      <td>
+         <p>Gives access to the class that represents writes within the I/O. 
The Write class should implement an interface similar to the builder pattern 
(e.g. withX(...).withY(...)).
+         <p>A user should not create this class directly. It should be created 
by a <a href="#bookmark=id.7yk3g4vwt7yn">top-level utility method</a>.
+         <ul>
+            <li>Support Beam Row output with a schema known at 
construction-time
+            <li>Extra configuration may be needed (and acceptable) in this 
case (e.g. a SchemaProvider parameter, a Schema parameter, a Schema Catalog or 
a utility of that sort).
+            <li>The input PCollection should have a fixed type with a schema, 
so it can be easily manipulated by users.
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.ReadAll.html";>JdbcIO.ReadAll</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.ReadFiles.html";>ParquetIO.ReadFiles</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Other Transform Classes
+      </td>
+      <td>
+         <p>Some data storage and external systems implement APIs that do not 
adjust easily to Read or Write semantics (e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.html";>FhirIO
 implements several different transforms</a> that fetch or send data to Fhir).
+         <p>These classes should be added <strong>only if it is impossible or 
prohibitively difficult to encapsulate their functionality as part of extra 
configuration of Read, Write and ReadAll</strong> transforms, to avoid 
increasing the cognitive load on users.
+         <p>A user should not create these classes directly. They should be 
created by a <a href="#bookmark=id.7yk3g4vwt7yn">top-level static method</a>.
+         <ul>
+            <li>Support Beam Row output with a schema known at 
construction-time
+            <li>Extra configuration may be needed (and acceptable) in this 
case (e.g. a SchemaProvider parameter, a Schema parameter, a Schema Catalog or 
a utility of that sort).
+            <li>The input PCollection should have a fixed type with a schema, 
so it can be easily manipulated by users.
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.ReadAll.html";>JdbcIO.ReadAll</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.ReadFiles.html";>ParquetIO.ReadFiles</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Utility Classes
+      </td>
+      <td>
+         <p>Some connectors rely on other user-facing classes to set 
configuration parameters.
+         <p>(e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.DataSourceConfiguration.html";>JdbcIO.DataSourceConfiguration</a>).
 These classes should be <strong>nested within the {Connector}IO class</strong>.
+         <p>This format makes them visible in the main Javadoc, and easy to 
discover by users.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Method IO&lt;T&gt;.write()
+      </td>
+      <td>
+         <p>The top-level I/O class will provide a <strong>static 
method</strong> to start constructing an I/O.Write transform. This returns a 
PTransform with a single input PCollection, and a Write.Result output.
+         <p>This method should not specify in its name any of the following:
+         <ul>
+            <li>Internal data format
+            <li>Strategy used to write data
+            <li>Input or output data type
+            </li>
+         </ul>
+      <p>The above should be specified via configuration parameters if 
possible. <strong>If impossible</strong>, then <strong>a new static 
method</strong> may be introduced, but this <strong>must be 
exceptional</strong>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Method IO&lt;T&gt;.read()
+      </td>
+      <td>
+         <p>The method to start constructing an I/O.Read transform. This 
returns a PTransform with a single output PCollection.
+         <p>This method should not specify in its name any of the following:
+         <ul>
+            <li>Internal data format
+            <li>Strategy used to read data
+            <li>Output data type
+            </li>
+         </ul>
+      <p>The above should be specified via configuration parameters if 
possible. <strong>If not possible</strong>, then <strong>a new static 
method</strong> may be introduced, but this <strong>must be exceptional, and 
documented in the I/O header as part of the API</strong>.
+      <p>The initial static constructor method may receive parameters if these 
are few and general, or if they are necessary to configure the transform (e.g. 
<a 
href="https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.html#exportResourcesToGcs-java.lang.String-java.lang.String-";>FhirIO.exportResourcesToGcs</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.html#readWithPartitions-org.apache.beam.sdk.values.TypeDescriptor-";>JdbcIO.ReadWithPartitions</a>
 needs a TypeDescriptor for initial configuration).
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.from(source)
+      </td>
+      <td>
+         <p>A Read transform must provide a <strong>from</strong> method where 
users can specify where to read from. If a transform can read from different 
<em>kinds</em> of sources (e.g. tables, queries, topics, partitions), then 
multiple implementations of this from method can be provided to accommodate 
this:
+         <ul>
+            <li>IO.Read from(Query query)
+            <li>IO.Read from(Table table) / from(String table)
+            <li>IO.Read from (Topic topic)
+            <li>IO.Read from(Partition partition)
+            </li>
+         </ul>
+         <p>The input type for these methods can reflect the external source’s 
API (e.g. <a 
href="https://kafka.apache.org/27/javadoc/?org/apache/kafka/common/TopicPartition.html";>Kafka
 TopicPartition</a> should use a <strong>Beam-implemented</strong> 
TopicPartition object).
+         <p>Sometimes, there may be multiple <strong>from</strong> locations 
that use the same input type, which means we cannot leverage method 
overloading. With this in mind, use a new method to enable this situation.
+         <ul>
+            <li>IO.Read from(String table)
+            <li>IO.Read fromQuery(String query)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.fromABC(String abc)
+      </td>
+      <td>
+         <p><strong>This pattern is discouraged IFF method overloading is 
impossible.</strong> Refer to <a 
href="#bookmark=id.2ptx93mbewv2">Read.from(source) guidance</a>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Write.to(destination)
+      </td>
+      <td>
+         <p>A Write transform must provide a <strong>to</strong> method where 
users can specify where to write data. If a transform can write to different 
<em>kinds</em> of sources while still using the same input element type(e.g. 
tables, queries, topics, partitions), then multiple implementations of this 
from method can be provided to accommodate this:
+         <ul>
+            <li>IO.Write to(Query query)
+            <li>IO.Write to(Table table) / from(String table)
+            <li>IO.Write to(Topic topic)
+            <li>IO.Write to(Partition partition)
+            </li>
+         </ul>
+         <p>The input type for these methods can use an Apache Beam utility 
that reflects the external source’s API (e.g. <a 
href="https://kafka.apache.org/27/javadoc/?org/apache/kafka/common/TopicPartition.html";>Kafka
 TopicPartition</a>). Per G5, Apache Beam, or an interface given by Apache Beam 
should not expose external libraries.
+         <p>If different kinds of destinations require different types of 
input object types, then these should be done in separate I/O connectors.
+         <p>Sometimes, there may be multiple <strong>from</strong> locations 
that use the same input type, which means we cannot leverage method 
overloading. With this in mind, use a new method to enable this situation.
+         <ul>
+            <li>IO.Write to(String table)
+            <li>IO.Write toTable(String table)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Write.to(DynamicDestination destination)
+      </td>
+      <td>
+         <p>A write transform may enable writing to more than one destination. 
This can be a complicated pattern that should be implemented carefully (it is 
the preferred pattern for connectors that will likely have multiple 
destinations in a single pipeline).
+         <p>The preferred pattern for this is to define a DynamicDestinations 
interface (e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.html";>BigQueryIO.DynamicDestinations</a>)
 that will allow the user to define all necessary parameters for the 
configuration of the destination.
+         <p>The DynamicDestinations interface also allows maintainers to add 
new methods over time (with <strong>default implementations</strong> to avoid 
breaking existing users) that will define extra configuration parameters if 
necessary.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Write.toABC(destination)
+      </td>
+      <td>
+         <p><strong>This pattern is discouraged IFF method overloading is 
impossible.</strong> Refer to <a 
href="#bookmark=id.nwnj8l50zeyq">Write.to(source) guidance</a>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Read.withX
+         <p>IO.Write.withX
+      </td>
+      <td>
+         <p>withX provides a method for configuration to be passed to the Read 
method, where X represents the configuration to be created. With the exception 
of generic with statements ( defined below ) the I/O should attempt to match 
the name of the configuration option with that of the option name in the source.
+         <p>These methods should return a new instance of the I/O rather than 
modifying the existing instance.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/io/TextIO.Read.html#withCompression-org.apache.beam.sdk.io.Compression-";>TextIO.Read.withCompression</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.withConfigObject
+         <p>IO.Write.withConfigObject
+      </td>
+      <td>
+         <p>Some connectors in Java receive a configuration object as part of 
their configuration. <strong>This pattern is encouraged only for particular 
cases</strong>. In most cases, a connector can hold all necessary configuration 
parameters at the top level.
+         <p>To determine whether a multi-parameter configuration object is an 
appropriate parameter for a high level transform, the configuration object must:
+         <ul>
+            <li>Hold only properties related to the connection/authentication 
parameters for the external data store (e.g. JdbcIO.DataSourceConfiguration).
+            <ul>
+               <li>Generally, <strong>secrets should not be passed as 
parameters</strong>, unless an alternative is not feasible. For secret 
management, a secret-management service or KMS is the recommended approach.
+            </ul>
+            <li><strong>Or </strong>mirror an API characteristic from the 
external data source (e.g. KafkaIO.Read.withConsumerConfigUpdates), without 
exposing that external API in the Beam API.
+            <ul>
+               <li>The method should mirror the name of the API object (e.g. 
given an object SubscriptionStatConfig, a method would be 
withSubscriptionStatConfig).
+            </ul>
+            <li><strong>Or</strong> when a connector can support different 
configuration ‘paths’ where a particular property requires other properties to 
be specified (e.g. BigQueryIO’s method will entail various different 
properties). (see last examples).
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.DataSourceConfiguration.html";>JdbcIO.DataSourceConfiguration</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.html";>SpannerConfig</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.ReadSourceDescriptors.html#withConsumerConfigUpdates-java.util.Map-";>KafkaIO.Read.withConsumerConfigUpdates</a>
+         <p>{{< highlight java >}}
+BigQueryIO.write()
+   .withWriteConfig(FileLoadsConfig.withAvro()
+   .withTriggeringFrequency()...)
+
+BigQueryIO.write()
+   .withWriteConfig(StreamingInsertsConfig.withDetailedError()
+   .withExactlyOnce().etc..)
+            {{< /highlight >}}
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Write.withFormatFunction
+      </td>
+      <td>
+         <p><strong>Discouraged - except for dynamic destinations</strong>
+         <p>For sources that can receive Beam Row-typed PCollections, the 
format function should not be necessary, because Beam should be able to format 
the input data based on its schema.
+         <p>For sinks providing Dynamic Destination functionality, elements 
may carry data that helps determine their destination. These data may need to 
be removed before writing to their final destination.
+         <p>To include this method, a connector should:
+         <ul>
+            <li>Show that it’s not possible to perform data matching 
automatically
+            <li>Support Dynamic Destinations and need changes to the input 
data due to that reason
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.withCoder
+         <p>IO.Write.withCoder
+      </td>
+      <td>
+         <p><strong>Strongly Discouraged</strong>
+         <p>Sets the coder to use to encode/decode the element type of the 
output / input PCollection of this connector. In general, it is recommended 
that sources will:
+         <ol>
+            <li>Return Row objects with a schema that is automatically inferred
+            <li>Automatically set the necessary coder by having fixed 
output/input types, or inferring their output/input types
+            </li>
+         </ol>
+         <p>If nether #1 and #2 are possible, then a withCoder method can be 
added.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.ABC.withEndpoint / with{IO}Client / withClient
+      </td>
+      <td>
+         <p>Connector transforms should provide a method to override the 
interface between themselves and the external system that they communicate 
with. This can enable various uses:
+         <p>Sets the coder to use to encode/decode the element type of the 
output / input PCollection of this connector. In general, it is recommended 
that sources will:
+         <ul>
+            <li>Local testing by mocking the destination service
+            <li>User-enabled metrics, monitoring, and security handling in the 
client.
+            <li>Integration testing based on emulators
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTestServices-org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices-";>BigQueryIO.Write.withTestServices</a>(<a
 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.html";>BigQueryServices</a>)
+      </td>
+   </tr>
+</table>
+</div>
+
+
+
+#### Types
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <th>
+         <p>Java Syntax
+      </th>
+      <th>
+         <p>Semantics
+      </th>
+   </tr>
+   <tr>
+      <td>
+         <p>Method IO.Read.expand
+      </td>
+      <td>
+         <p>The expand method of a Read transform must return a PCollection 
object with a type. The type may be parameterized or fixed to a class.
+         <p>A user should <strong>not</strong> create this class directly. It 
should be created by a <a href="#bookmark=id.kafer4mjzh1m">top-level utility 
method</a>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Method IO.Read.expand’s PCollection type
+      </td>
+      <td>
+         <p>The type of the PCollection will usually be one of the following 
four options. For each of these option, the encoding / data is recommended to 
be as follows:
+         <ul>
+            <li>A pre-defined, basic Java type (e.g. String)
+            <ul>
+               <li>This encoding should be simple, and use a simple Beam coder 
(e.g. Utf8StringCoder)
+               </li>
+            </ul>
+            <li>A pre-set POJO type (e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/fs/MatchResult.Metadata.html";>Metadata</a>)
 with a schema
+            <ul>
+               <li>The preferred strategy for these is to define the output 
type as an <a 
href="https://stackoverflow.com/questions/62546191/how-do-i-use-an-autovalue-data-type-for-my-pcollection-in-apache-beam";>@AutoValue,
 with @DefaultSchema and @SchemaCreate</a> annotations. This will ensure 
compact, fast encoding with RowCoder.
+               </li>
+            </ul>
+            <li>A Beam Row with a specific schema
+            <li>A type with a schema that’s not known at construction time
+            </li>
+         </ul>
+         <p>In all cases, asking a user to pass a coder (e.g. withCoder(...)) 
is <strong>discouraged</strong>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>method IO.Write.expand
+      </td>
+      <td>
+         <p>The expand method of any write transform must return a type 
IO.Write.Result object that extends a PCollectionTuple. This object allows 
transforms to return metadata about the results of its writing and allows this 
write to be followed by other PTransforms.
+         <p>If the Write transform would not need to return any metadata, a 
Write.Result object <strong>is still preferable</strong>, because it will allow 
the transform to evolve its metadata over time.
+         <p>Examples of metadata:
+         <ul>
+            <li>Failed elements and errors
+            <li>Successfully written elements
+            <li>API tokens from calls issued by the transform
+            </li>
+         </ul>
+         <p>Examples:
+         <p>BigQueryIO’s <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html";>WriteResult</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+#### Evolution
+
+Over time, I/O need to evolve to address new use cases, or use new APIs under 
the covers. Some examples of necessary evolution of an I/O:
+
+* A new data type needs to be supported within it (e.g. [any-type partitioning 
in JdbcIO.ReadWithPartitions](https://github.com/apache/beam/pull/15848))
+* A new backend API needs to be supported (e.g.

Review Comment:
   removed.



##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 
1P/3P developers developing an Apache Beam I/O connector. These guidelines aim 
to create best practices encompassing documentation, development and testing in 
a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as 
a **Built-in I/O connector**. Built-in I/O’s have their [integration 
tests](#integration-tests) and performance tests routinely run by the Google 
Cloud Dataflow Team using the Dataflow Runner and metrics published publicly 
for [reference](#dashboard). Otherwise, the following guidelines will apply to 
both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be 
made available with an I/O. The Apache Beam documentation referenced throughout 
this section can be found [here](https://beam.apache.org/documentation/). And 
generally a good example to follow would be the built-in I/O, [Snowflake 
I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This 
should also have links to any external sources of information within the Apache 
Beam site or external documentation location.
+         <p>Examples:
+         <ul>
+            <li><a 
href="https://beam.apache.org/releases/javadoc/current/overview-summary.html";>Java
 doc</a>
+            <li><a 
href="https://beam.apache.org/releases/pydoc/current/";>Python doc</a>
+            <li><a 
href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam";>Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that 
covers specific tips and configurations. The following shows those for <a 
href="https://beam.apache.org/documentation/io/built-in/parquet/";>Parquet</a>, 
<a href="https://beam.apache.org/documentation/io/built-in/hadoop/";>Hadoop</a> 
and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" 
width="" alt="I/O connector guides screenshot" title="I/O connector guides 
screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc should 
be consistent throughout such that programmatic information extraction for 
other pages can be enabled in the future.
+         <p>Example <strong>subset</strong> of sections to include in your 
page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>
+            <li>Limitations
+            <li>Reporting an Issue
+            </li>
+         </ol>
+         <p>Example:
+         <p>The KafkaIO <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html";>JavaDoc</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>I/O Connectors should include a note indicating <a 
href="https://2022.beamsummit.org/sessions/relational-beam/";>Relational 
Features</a> supported in their page under <strong>I/O connector 
guides</strong>.
+         <p>Relational Features are concepts that can help improve efficiency 
and can optionally be implemented by an I/O Connector. Using end user supplied 
pipeline configuration (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/SchemaIO.html";>SchemaIO</a>)
 and user query (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/FieldAccessDescriptor.html";>FieldAccessDescriptor</a>)
 data, relational theory is applied to derive improvements such as faster 
pipeline execution, lower operation costs and less data read/written.
+         <p>Example table:
+         <p><img 
src="/images/io-standards/io-supported-relational-features-table.png" width="" 
alt="Supported Relational Features" title="Supported Relational Features"></img>
+         <p>Example implementations:
+         <p>BigQueryIO <a 
href="https://github.com/apache/beam/blob/5bb13fa35b9bc36764895c57f23d3890f0f1b567/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1813";>Column
 Pruning</a> via ProjectionPushdown to return only necessary columns indicated 
by an end user's query. This is achieved using BigQuery DirectRead API.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a page under <strong>Common pipeline patterns</strong>, if 
necessary, outlining common usage patterns involving your I/O.
+         <p><a 
href="https://beam.apache.org/documentation/patterns/bigqueryio/";>https://beam.apache.org/documentation/patterns/bigqueryio/</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Update <strong>I/O Connectors</strong> with your I/O’s information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors";>https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors</a>
+         <p><img src="/images/io-standards/io-supported-via-screenshot.png" 
width="" alt="alt_text" title="image_tooltip">
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Provide setup steps to use the I/O, under a <strong>Before you 
start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start";>https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Include a canonical read/write code snippet after the initial 
description for each supported language. The below example shows Hadoop with 
examples for Java.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformatio";>https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformation</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps for elements are assigned. This includes 
batch sources to allow for future I/Os which may provide more useful 
information than current_time().
+         <p>Example:
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps are advanced; for Batch sources this will 
be marked as n/a in most cases.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Outline any temporary resources (for example, files) that the 
connector will create.
+         <p>Example:
+         <p>BigQuery batch loads first create a temp GCS location
+         <p><a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455";>https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Provide, under an <strong>Authentication</strong> subheader, how 
to acquire partner authorization material to securely access the source/sink.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#authentication";>https://beam.apache.org/documentation/io/built-in/snowflake/#authentication</a>
+         <p>Here BigQuery names it permissions but the topic covers 
similarities
+         <p><a 
href="https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html";>https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>I/Os should provide links to the Source/Sink documentation within 
<strong>Before you start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/";>https://beam.apache.org/documentation/io/built-in/snowflake/</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Indicate if there is native or X-language support in each language 
with a link to the docs.
+         <p>Example:
+         <p>Kinesis I/O has a native implementation of java and X-language 
support for python but no support for Golang.
+      </td>
+  </tr>
+  <tr>
+   <td>
+      <p>Indicate known limitations under a <strong>Limitations</strong> 
header. If the limitation has a tracking issue, please link it inline.
+      <p>Example:
+      <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#limitations";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+   </td>
+  </tr>
+</table>
+</div>
+
+
+
+### I/O (not built-in)
+
+Custom I/Os are not included in the Apache Beam Github repository. Some 
examples would be 
[Solace](https://github.com/SolaceProducts/solace-apache-beam)IO.
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-connectors">
+   <tr>
+      <td>
+         <p>Update I/O connectors with your I/O information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+## Development
+
+This section outlines API Syntax, Semantics and Feature Adoption 
recommendations for new and existing Apache Beam I/O Connectors.
+
+Development guidelines are written with the following principles in mind:
+
+
+
+* Consistency makes an API easier to learn
+    * If there are multiple ways of doing something, we should strive to be 
consistent first
+* With a couple minutes of studying documentation, users should be able to 
pick up most I/O connectors
+* The design of a new I/O should consider the possibility of evolution
+* Transforms should integrate well with other Beam utilities
+
+
+### All SDKs
+
+
+#### Pipeline Configuration / Execution / Streaming / Windowing semantics 
guidelines
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <th>
+         <p>Topic
+      </th>
+      <th>
+         <p>Semantics
+      </th>
+   </tr>
+   <tr>
+      <td>
+         <p>Pipeline Options
+      </td>
+      <td>
+         <p>An I/O should rarely rely on a PipelineOptions subclass to tune 
internal parameters.
+         <p>A connector-related pipeline options class should:
+         <ul>
+            <li>Document clearly, for each option, the effect it has and why 
one may modify it.
+            <li>Option names must be namespaced to avoid collisions
+            <li>Class Name: {Connector}Options
+            <li>Method names: .set{Connector}{Option}, get{Connector}{Option}
+            </li>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Source Windowing
+      </td>
+      <td>
+         <p>A source must return elements in the GlobalWindow unless 
explicitly parameterized in the API by the user.
+         <p>Allowable Non-global-window patterns:
+         <ul>
+            <li>ReadFromIO(window_by=...)
+            <li>ReadFromIO.IntoFixedWindows(...)
+            <li>ReadFromIO(apply_windowing=True/False) (e.g. <a 
href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.periodicsequence.html#apache_beam.transforms.periodicsequence.PeriodicImpulse";>PeriodicImpulse</a>)
+            <li>IO.read().withWindowing(...)
+            <li>IO.read().windowBy(...)
+            <li>IO.read().withFixedWindows(...)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Sink Windowing
+      </td>
+      <td>
+         <p>A sink should be Window agnostic and handle elements sent with any 
Windowing methodexpect elements to be sent to it in the Global Window, unless 
explicitly parameterized or expressed in its API.
+         <p>A sink may change the windowing of a PCollection internally 
however it needs, however, the metadata that it returns as part of its Result 
object must be:
+         <ul>
+            <li>In the same window, unless explicitly declared in the API
+            <li>With accurate timestamps
+            <li><strong>It may</strong> also return metadata with information 
about windowing (e.g. a BigQuery job may have a timestamp, but also a window 
associated with it).
+         </ul>
+         <p>Allowable non-global-window patterns:
+         <ul>
+            <li>WriteToIO(triggering_frequency=...) - e.g. <a 
href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html#apache_beam.io.gcp.bigquery.WriteToBigQuery";>WriteToBigQuery</a>
 (This only sets the windowing within the transform - input data is still in 
the Global Window).
+            <li>WriteBatchesToIO(...)
+            <li>WriteWindowsToIO(...)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Throttling
+      </td>
+      <td>
+         <p>A streaming sink (or any transform accessing an external service) 
may implement throttling of its requests to prevent from overloading the 
external service.
+         <p>TODO: Beam should expose throttling utilities (<a 
href="https://github.com/apache/beam/issues/24743";>Tracking Issue</a>):
+         <ul>
+            <li>Per-key fixed throttling
+            <li>Adaptive throttling with sink-reported backpressure
+            <li>Ramp-up throttling from a start point
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Error handling
+      </td>
+      <td>
+         <p>TODO: <a 
href="https://github.com/apache/beam/issues/24742";>Tracking Issue</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+
+
+### Java
+
+
+#### General
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>The primary class used in working with the connector should be 
named <strong>{connector}IO</strong>
+         <p>Example:
+         <p>The BigQuery I/O is 
<strong>org.apache.beam.sdk.io.bigquery.BigQueryIO</strong>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>The class should be placed in the package 
<strong>org.apache.beam.sdk.io.{connector}</strong>
+         <p>Example:
+         <p>The BigQueryIO belongs in the java package <a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java";>org.apache.beam.sdk.io.bigquery</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>The unit/integration/performance tests should live under the 
package <strong>org.apache.beam.sdk.io.{connector}.testing</strong>. This will 
cause the various tests to work with the standard user-facing interfaces of the 
connector.
+         <p>Unit tests should reside in the same package (i.e. 
<strong>org.apache.beam.sdk.io.{connector}</strong>), as they may often test 
internals of the connector.
+         <p>The BigQueryIO belongs in the java package <a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java";>org.apache.beam.sdk.io.bigquery</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>An I/O transform should avoid receiving user lambdas to map 
elements from a user type to a connector-specific type. Instead, they should 
interface with a connector-specific data type (with schema information when 
possible).
+         <p>When necessary, then an I/O transform should receive a type 
parameter that specifies the input type (for sinks) or output type (for 
sources) of the transform.
+         <p>An I/O transform may not have a type parameter <strong>only if it 
is certain that its output type will not change</strong> (e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.MatchAll.html";>FileIO.MatchAll</a>
 and other <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html";>FileIO
 transforms</a>).
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>As part of the API of an I/O, it is highly discouraged to directly 
expose third-party libraries in the public API of a Beam API or connector.
+         <ul>
+            <li>It reduces Apache Beam’s compatibility guarantees - Changes to 
third-party libraries can/will directly break existing user’s pipelines.
+            <li>It makes code maintainability hard - If libraries are directly 
exposed at API level, a dependency change will require multiple changes 
throughout the I/O implementation code
+            <li>It forces third-party dependencies onto end users
+            </li>
+         </ul>
+         <p>Instead, we highly recommend exposing Beam-native interfaces and 
an adaptor be implemented to translate.
+         <p>If you believe that the library in question is extremely static in 
nature. Please note it in the I/O itself.
+         <p>As part of the API of an I/O, it is <strong>highly 
discouraged</strong> to expose third-party libraries in the public API of a 
Beam API or connector,. Instead, a Beam-native interface should be used and 
adapted into the third-library object.
+         <p>The reasonings are because:
+         <ul>
+            <li>Third-party libraries will make dependency upgrades difficult, 
because of Beam’s backwards compatibility guarantees
+            <li>Exposing third-party libraries may force dependencies onto 
Beam users
+            </li>
+         </ul>
+         <p>This requirement has the shortcoming that Beam will simply mirror 
external library API objects, but it will allow us to upgrade external 
libraries if needed, and let users deal with versions as well.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Source and Sinks should be abstracted with a PTransform wrapper, 
and internal classes be declared protected or private. By doing so 
implementation details can be added/changed/modified without breaking 
implementation by dependencies.
+      </td>
+   </tr>
+</table>
+</div>
+
+
+#### Classes / Methods / Properties
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <th>
+         <p>Java Syntax
+      </th>
+      <th>
+         <p>Semantics
+      </th>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Read
+      </td>
+      <td>
+         <p>Gives access to the class that represents reads within the I/O. 
The Read class should implement an  fluent interface similar to the 
fluentbuilder pattern (e.g. withX(...).withY(...)). Together with default 
values; it provides a fail-fast (with immediate validation feedback after each 
.withX() and slightly less verbosity compared to builder pattern.
+         <p>A user should <strong>not</strong> create this class directly. It 
should be created by a <a href="#bookmark=id.kafer4mjzh1m">top-level utility 
method</a>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.ReadAll
+      </td>
+      <td>
+         <p>A few different sources implement runtime configuration for 
reading from a data source. This is a valuable pattern because it enables a 
purely batch source to become a more sophisticated streaming source.
+         <p>As much as possible, this type of transform should have the type 
richness of a construction-time-configured transform:
+         <ul>
+            <li>Support Beam Row output with a schema known at 
construction-time
+            <li>Extra configuration may be needed (and acceptable) in this 
case (e.g. a SchemaProvider parameter, a Schema parameter, a Schema Catalog or 
a utility of that sort).
+            <li>The input PCollection should have a fixed type with a schema, 
so it can be easily manipulated by users.
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.ReadAll.html";>JdbcIO.ReadAll</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.ReadFiles.html";>ParquetIO.ReadFiles</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Write
+      </td>
+      <td>
+         <p>Gives access to the class that represents writes within the I/O. 
The Write class should implement an interface similar to the builder pattern 
(e.g. withX(...).withY(...)).
+         <p>A user should not create this class directly. It should be created 
by a <a href="#bookmark=id.7yk3g4vwt7yn">top-level utility method</a>.
+         <ul>
+            <li>Support Beam Row output with a schema known at 
construction-time
+            <li>Extra configuration may be needed (and acceptable) in this 
case (e.g. a SchemaProvider parameter, a Schema parameter, a Schema Catalog or 
a utility of that sort).
+            <li>The input PCollection should have a fixed type with a schema, 
so it can be easily manipulated by users.
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.ReadAll.html";>JdbcIO.ReadAll</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.ReadFiles.html";>ParquetIO.ReadFiles</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Other Transform Classes
+      </td>
+      <td>
+         <p>Some data storage and external systems implement APIs that do not 
adjust easily to Read or Write semantics (e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.html";>FhirIO
 implements several different transforms</a> that fetch or send data to Fhir).
+         <p>These classes should be added <strong>only if it is impossible or 
prohibitively difficult to encapsulate their functionality as part of extra 
configuration of Read, Write and ReadAll</strong> transforms, to avoid 
increasing the cognitive load on users.
+         <p>A user should not create these classes directly. They should be 
created by a <a href="#bookmark=id.7yk3g4vwt7yn">top-level static method</a>.
+         <ul>
+            <li>Support Beam Row output with a schema known at 
construction-time
+            <li>Extra configuration may be needed (and acceptable) in this 
case (e.g. a SchemaProvider parameter, a Schema parameter, a Schema Catalog or 
a utility of that sort).
+            <li>The input PCollection should have a fixed type with a schema, 
so it can be easily manipulated by users.
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.ReadAll.html";>JdbcIO.ReadAll</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.ReadFiles.html";>ParquetIO.ReadFiles</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Utility Classes
+      </td>
+      <td>
+         <p>Some connectors rely on other user-facing classes to set 
configuration parameters.
+         <p>(e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.DataSourceConfiguration.html";>JdbcIO.DataSourceConfiguration</a>).
 These classes should be <strong>nested within the {Connector}IO class</strong>.
+         <p>This format makes them visible in the main Javadoc, and easy to 
discover by users.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Method IO&lt;T&gt;.write()
+      </td>
+      <td>
+         <p>The top-level I/O class will provide a <strong>static 
method</strong> to start constructing an I/O.Write transform. This returns a 
PTransform with a single input PCollection, and a Write.Result output.
+         <p>This method should not specify in its name any of the following:
+         <ul>
+            <li>Internal data format
+            <li>Strategy used to write data
+            <li>Input or output data type
+            </li>
+         </ul>
+      <p>The above should be specified via configuration parameters if 
possible. <strong>If impossible</strong>, then <strong>a new static 
method</strong> may be introduced, but this <strong>must be 
exceptional</strong>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Method IO&lt;T&gt;.read()
+      </td>
+      <td>
+         <p>The method to start constructing an I/O.Read transform. This 
returns a PTransform with a single output PCollection.
+         <p>This method should not specify in its name any of the following:
+         <ul>
+            <li>Internal data format
+            <li>Strategy used to read data
+            <li>Output data type
+            </li>
+         </ul>
+      <p>The above should be specified via configuration parameters if 
possible. <strong>If not possible</strong>, then <strong>a new static 
method</strong> may be introduced, but this <strong>must be exceptional, and 
documented in the I/O header as part of the API</strong>.
+      <p>The initial static constructor method may receive parameters if these 
are few and general, or if they are necessary to configure the transform (e.g. 
<a 
href="https://beam.apache.org/releases/javadoc/2.29.0/org/apache/beam/sdk/io/gcp/healthcare/FhirIO.html#exportResourcesToGcs-java.lang.String-java.lang.String-";>FhirIO.exportResourcesToGcs</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.html#readWithPartitions-org.apache.beam.sdk.values.TypeDescriptor-";>JdbcIO.ReadWithPartitions</a>
 needs a TypeDescriptor for initial configuration).
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.from(source)
+      </td>
+      <td>
+         <p>A Read transform must provide a <strong>from</strong> method where 
users can specify where to read from. If a transform can read from different 
<em>kinds</em> of sources (e.g. tables, queries, topics, partitions), then 
multiple implementations of this from method can be provided to accommodate 
this:
+         <ul>
+            <li>IO.Read from(Query query)
+            <li>IO.Read from(Table table) / from(String table)
+            <li>IO.Read from (Topic topic)
+            <li>IO.Read from(Partition partition)
+            </li>
+         </ul>
+         <p>The input type for these methods can reflect the external source’s 
API (e.g. <a 
href="https://kafka.apache.org/27/javadoc/?org/apache/kafka/common/TopicPartition.html";>Kafka
 TopicPartition</a> should use a <strong>Beam-implemented</strong> 
TopicPartition object).
+         <p>Sometimes, there may be multiple <strong>from</strong> locations 
that use the same input type, which means we cannot leverage method 
overloading. With this in mind, use a new method to enable this situation.
+         <ul>
+            <li>IO.Read from(String table)
+            <li>IO.Read fromQuery(String query)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.fromABC(String abc)
+      </td>
+      <td>
+         <p><strong>This pattern is discouraged IFF method overloading is 
impossible.</strong> Refer to <a 
href="#bookmark=id.2ptx93mbewv2">Read.from(source) guidance</a>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Write.to(destination)
+      </td>
+      <td>
+         <p>A Write transform must provide a <strong>to</strong> method where 
users can specify where to write data. If a transform can write to different 
<em>kinds</em> of sources while still using the same input element type(e.g. 
tables, queries, topics, partitions), then multiple implementations of this 
from method can be provided to accommodate this:
+         <ul>
+            <li>IO.Write to(Query query)
+            <li>IO.Write to(Table table) / from(String table)
+            <li>IO.Write to(Topic topic)
+            <li>IO.Write to(Partition partition)
+            </li>
+         </ul>
+         <p>The input type for these methods can use an Apache Beam utility 
that reflects the external source’s API (e.g. <a 
href="https://kafka.apache.org/27/javadoc/?org/apache/kafka/common/TopicPartition.html";>Kafka
 TopicPartition</a>). Per G5, Apache Beam, or an interface given by Apache Beam 
should not expose external libraries.
+         <p>If different kinds of destinations require different types of 
input object types, then these should be done in separate I/O connectors.
+         <p>Sometimes, there may be multiple <strong>from</strong> locations 
that use the same input type, which means we cannot leverage method 
overloading. With this in mind, use a new method to enable this situation.
+         <ul>
+            <li>IO.Write to(String table)
+            <li>IO.Write toTable(String table)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Write.to(DynamicDestination destination)
+      </td>
+      <td>
+         <p>A write transform may enable writing to more than one destination. 
This can be a complicated pattern that should be implemented carefully (it is 
the preferred pattern for connectors that will likely have multiple 
destinations in a single pipeline).
+         <p>The preferred pattern for this is to define a DynamicDestinations 
interface (e.g. <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinations.html";>BigQueryIO.DynamicDestinations</a>)
 that will allow the user to define all necessary parameters for the 
configuration of the destination.
+         <p>The DynamicDestinations interface also allows maintainers to add 
new methods over time (with <strong>default implementations</strong> to avoid 
breaking existing users) that will define extra configuration parameters if 
necessary.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Write.toABC(destination)
+      </td>
+      <td>
+         <p><strong>This pattern is discouraged IFF method overloading is 
impossible.</strong> Refer to <a 
href="#bookmark=id.nwnj8l50zeyq">Write.to(source) guidance</a>.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Read.withX
+         <p>IO.Write.withX
+      </td>
+      <td>
+         <p>withX provides a method for configuration to be passed to the Read 
method, where X represents the configuration to be created. With the exception 
of generic with statements ( defined below ) the I/O should attempt to match 
the name of the configuration option with that of the option name in the source.
+         <p>These methods should return a new instance of the I/O rather than 
modifying the existing instance.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/io/TextIO.Read.html#withCompression-org.apache.beam.sdk.io.Compression-";>TextIO.Read.withCompression</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.withConfigObject
+         <p>IO.Write.withConfigObject
+      </td>
+      <td>
+         <p>Some connectors in Java receive a configuration object as part of 
their configuration. <strong>This pattern is encouraged only for particular 
cases</strong>. In most cases, a connector can hold all necessary configuration 
parameters at the top level.
+         <p>To determine whether a multi-parameter configuration object is an 
appropriate parameter for a high level transform, the configuration object must:
+         <ul>
+            <li>Hold only properties related to the connection/authentication 
parameters for the external data store (e.g. JdbcIO.DataSourceConfiguration).
+            <ul>
+               <li>Generally, <strong>secrets should not be passed as 
parameters</strong>, unless an alternative is not feasible. For secret 
management, a secret-management service or KMS is the recommended approach.
+            </ul>
+            <li><strong>Or </strong>mirror an API characteristic from the 
external data source (e.g. KafkaIO.Read.withConsumerConfigUpdates), without 
exposing that external API in the Beam API.
+            <ul>
+               <li>The method should mirror the name of the API object (e.g. 
given an object SubscriptionStatConfig, a method would be 
withSubscriptionStatConfig).
+            </ul>
+            <li><strong>Or</strong> when a connector can support different 
configuration ‘paths’ where a particular property requires other properties to 
be specified (e.g. BigQueryIO’s method will entail various different 
properties). (see last examples).
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jdbc/JdbcIO.DataSourceConfiguration.html";>JdbcIO.DataSourceConfiguration</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.html";>SpannerConfig</a>,
 <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.ReadSourceDescriptors.html#withConsumerConfigUpdates-java.util.Map-";>KafkaIO.Read.withConsumerConfigUpdates</a>
+         <p>{{< highlight java >}}
+BigQueryIO.write()
+   .withWriteConfig(FileLoadsConfig.withAvro()
+   .withTriggeringFrequency()...)
+
+BigQueryIO.write()
+   .withWriteConfig(StreamingInsertsConfig.withDetailedError()
+   .withExactlyOnce().etc..)
+            {{< /highlight >}}
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>class IO.Write.withFormatFunction
+      </td>
+      <td>
+         <p><strong>Discouraged - except for dynamic destinations</strong>
+         <p>For sources that can receive Beam Row-typed PCollections, the 
format function should not be necessary, because Beam should be able to format 
the input data based on its schema.
+         <p>For sinks providing Dynamic Destination functionality, elements 
may carry data that helps determine their destination. These data may need to 
be removed before writing to their final destination.
+         <p>To include this method, a connector should:
+         <ul>
+            <li>Show that it’s not possible to perform data matching 
automatically
+            <li>Support Dynamic Destinations and need changes to the input 
data due to that reason
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.Read.withCoder
+         <p>IO.Write.withCoder
+      </td>
+      <td>
+         <p><strong>Strongly Discouraged</strong>
+         <p>Sets the coder to use to encode/decode the element type of the 
output / input PCollection of this connector. In general, it is recommended 
that sources will:
+         <ol>
+            <li>Return Row objects with a schema that is automatically inferred
+            <li>Automatically set the necessary coder by having fixed 
output/input types, or inferring their output/input types
+            </li>
+         </ol>
+         <p>If nether #1 and #2 are possible, then a withCoder method can be 
added.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>IO.ABC.withEndpoint / with{IO}Client / withClient
+      </td>
+      <td>
+         <p>Connector transforms should provide a method to override the 
interface between themselves and the external system that they communicate 
with. This can enable various uses:
+         <p>Sets the coder to use to encode/decode the element type of the 
output / input PCollection of this connector. In general, it is recommended 
that sources will:
+         <ul>
+            <li>Local testing by mocking the destination service
+            <li>User-enabled metrics, monitoring, and security handling in the 
client.
+            <li>Integration testing based on emulators
+            </li>
+         </ul>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTestServices-org.apache.beam.sdk.io.gcp.bigquery.BigQueryServices-";>BigQueryIO.Write.withTestServices</a>(<a
 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServices.html";>BigQueryServices</a>)
+      </td>
+   </tr>
+</table>
+</div>
+
+
+
+#### Types
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <th>
+         <p>Java Syntax
+      </th>
+      <th>
+         <p>Semantics
+      </th>
+   </tr>
+   <tr>
+      <td>
+         <p>Method IO.Read.expand
+      </td>
+      <td>
+         <p>The expand method of a Read transform must return a PCollection 
object with a type. The type may be parameterized or fixed to a class.
+         <p>A user should <strong>not</strong> create this class directly. It 
should be created by a <a href="#bookmark=id.kafer4mjzh1m">top-level utility 
method</a>.

Review Comment:
   fixed the link



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] hermanmak commented on a diff in pull request #24962: Add I/O Standards Page

Reply via email to