hermanmak commented on code in PR #24962:
URL: https://github.com/apache/beam/pull/24962#discussion_r1080762188


##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 
1P/3P developers developing an Apache Beam I/O connector. These guidelines aim 
to create best practices encompassing documentation, development and testing in 
a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as 
a **Built-in I/O connector**. Built-in I/O’s have their [integration 
tests](#integration-tests) and performance tests routinely run by the Google 
Cloud Dataflow Team using the Dataflow Runner and metrics published publicly 
for [reference](#dashboard). Otherwise, the following guidelines will apply to 
both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be 
made available with an I/O. The Apache Beam documentation referenced throughout 
this section can be found [here](https://beam.apache.org/documentation/). And 
generally a good example to follow would be the built-in I/O, [Snowflake 
I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This 
should also have links to any external sources of information within the Apache 
Beam site or external documentation location.
+         <p>Examples:
+         <ul>
+            <li><a 
href="https://beam.apache.org/releases/javadoc/current/overview-summary.html";>Java
 doc</a>
+            <li><a 
href="https://beam.apache.org/releases/pydoc/current/";>Python doc</a>
+            <li><a 
href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam";>Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that 
covers specific tips and configurations. The following shows those for <a 
href="https://beam.apache.org/documentation/io/built-in/parquet/";>Parquet</a>, 
<a href="https://beam.apache.org/documentation/io/built-in/hadoop/";>Hadoop</a> 
and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" 
width="" alt="I/O connector guides screenshot" title="I/O connector guides 
screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc should 
be consistent throughout such that programmatic information extraction for 
other pages can be enabled in the future.
+         <p>Example <strong>subset</strong> of sections to include in your 
page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>
+            <li>Limitations
+            <li>Reporting an Issue
+            </li>
+         </ol>
+         <p>Example:
+         <p>The KafkaIO <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html";>JavaDoc</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>I/O Connectors should include a note indicating <a 
href="https://2022.beamsummit.org/sessions/relational-beam/";>Relational 
Features</a> supported in their page under <strong>I/O connector 
guides</strong>.
+         <p>Relational Features are concepts that can help improve efficiency 
and can optionally be implemented by an I/O Connector. Using end user supplied 
pipeline configuration (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/SchemaIO.html";>SchemaIO</a>)
 and user query (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/FieldAccessDescriptor.html";>FieldAccessDescriptor</a>)
 data, relational theory is applied to derive improvements such as faster 
pipeline execution, lower operation costs and less data read/written.
+         <p>Example table:
+         <p><img 
src="/images/io-standards/io-supported-relational-features-table.png" width="" 
alt="Supported Relational Features" title="Supported Relational Features"></img>
+         <p>Example implementations:
+         <p>BigQueryIO <a 
href="https://github.com/apache/beam/blob/5bb13fa35b9bc36764895c57f23d3890f0f1b567/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1813";>Column
 Pruning</a> via ProjectionPushdown to return only necessary columns indicated 
by an end user's query. This is achieved using BigQuery DirectRead API.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a page under <strong>Common pipeline patterns</strong>, if 
necessary, outlining common usage patterns involving your I/O.
+         <p><a 
href="https://beam.apache.org/documentation/patterns/bigqueryio/";>https://beam.apache.org/documentation/patterns/bigqueryio/</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Update <strong>I/O Connectors</strong> with your I/O’s information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors";>https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors</a>
+         <p><img src="/images/io-standards/io-supported-via-screenshot.png" 
width="" alt="alt_text" title="image_tooltip">
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Provide setup steps to use the I/O, under a <strong>Before you 
start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start";>https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Include a canonical read/write code snippet after the initial 
description for each supported language. The below example shows Hadoop with 
examples for Java.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformatio";>https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformation</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps for elements are assigned. This includes 
batch sources to allow for future I/Os which may provide more useful 
information than current_time().
+         <p>Example:
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps are advanced; for Batch sources this will 
be marked as n/a in most cases.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Outline any temporary resources (for example, files) that the 
connector will create.
+         <p>Example:
+         <p>BigQuery batch loads first create a temp GCS location
+         <p><a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455";>https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Provide, under an <strong>Authentication</strong> subheader, how 
to acquire partner authorization material to securely access the source/sink.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#authentication";>https://beam.apache.org/documentation/io/built-in/snowflake/#authentication</a>
+         <p>Here BigQuery names it permissions but the topic covers 
similarities
+         <p><a 
href="https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html";>https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>I/Os should provide links to the Source/Sink documentation within 
<strong>Before you start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/";>https://beam.apache.org/documentation/io/built-in/snowflake/</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Indicate if there is native or X-language support in each language 
with a link to the docs.
+         <p>Example:
+         <p>Kinesis I/O has a native implementation of java and X-language 
support for python but no support for Golang.
+      </td>
+  </tr>
+  <tr>
+   <td>
+      <p>Indicate known limitations under a <strong>Limitations</strong> 
header. If the limitation has a tracking issue, please link it inline.
+      <p>Example:
+      <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#limitations";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+   </td>
+  </tr>
+</table>
+</div>
+
+
+
+### I/O (not built-in)
+
+Custom I/Os are not included in the Apache Beam Github repository. Some 
examples would be 
[Solace](https://github.com/SolaceProducts/solace-apache-beam)IO.
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-connectors">
+   <tr>
+      <td>
+         <p>Update I/O connectors with your I/O information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+## Development
+
+This section outlines API Syntax, Semantics and Feature Adoption 
recommendations for new and existing Apache Beam I/O Connectors.
+
+Development guidelines are written with the following principles in mind:
+
+
+
+* Consistency makes an API easier to learn

Review Comment:
   added.



##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 
1P/3P developers developing an Apache Beam I/O connector. These guidelines aim 
to create best practices encompassing documentation, development and testing in 
a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as 
a **Built-in I/O connector**. Built-in I/O’s have their [integration 
tests](#integration-tests) and performance tests routinely run by the Google 
Cloud Dataflow Team using the Dataflow Runner and metrics published publicly 
for [reference](#dashboard). Otherwise, the following guidelines will apply to 
both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be 
made available with an I/O. The Apache Beam documentation referenced throughout 
this section can be found [here](https://beam.apache.org/documentation/). And 
generally a good example to follow would be the built-in I/O, [Snowflake 
I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This 
should also have links to any external sources of information within the Apache 
Beam site or external documentation location.
+         <p>Examples:
+         <ul>
+            <li><a 
href="https://beam.apache.org/releases/javadoc/current/overview-summary.html";>Java
 doc</a>
+            <li><a 
href="https://beam.apache.org/releases/pydoc/current/";>Python doc</a>
+            <li><a 
href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam";>Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that 
covers specific tips and configurations. The following shows those for <a 
href="https://beam.apache.org/documentation/io/built-in/parquet/";>Parquet</a>, 
<a href="https://beam.apache.org/documentation/io/built-in/hadoop/";>Hadoop</a> 
and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" 
width="" alt="I/O connector guides screenshot" title="I/O connector guides 
screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc should 
be consistent throughout such that programmatic information extraction for 
other pages can be enabled in the future.
+         <p>Example <strong>subset</strong> of sections to include in your 
page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>
+            <li>Limitations
+            <li>Reporting an Issue
+            </li>
+         </ol>
+         <p>Example:
+         <p>The KafkaIO <a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html";>JavaDoc</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>I/O Connectors should include a note indicating <a 
href="https://2022.beamsummit.org/sessions/relational-beam/";>Relational 
Features</a> supported in their page under <strong>I/O connector 
guides</strong>.
+         <p>Relational Features are concepts that can help improve efficiency 
and can optionally be implemented by an I/O Connector. Using end user supplied 
pipeline configuration (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/SchemaIO.html";>SchemaIO</a>)
 and user query (<a 
href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/FieldAccessDescriptor.html";>FieldAccessDescriptor</a>)
 data, relational theory is applied to derive improvements such as faster 
pipeline execution, lower operation costs and less data read/written.
+         <p>Example table:
+         <p><img 
src="/images/io-standards/io-supported-relational-features-table.png" width="" 
alt="Supported Relational Features" title="Supported Relational Features"></img>
+         <p>Example implementations:
+         <p>BigQueryIO <a 
href="https://github.com/apache/beam/blob/5bb13fa35b9bc36764895c57f23d3890f0f1b567/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1813";>Column
 Pruning</a> via ProjectionPushdown to return only necessary columns indicated 
by an end user's query. This is achieved using BigQuery DirectRead API.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a page under <strong>Common pipeline patterns</strong>, if 
necessary, outlining common usage patterns involving your I/O.
+         <p><a 
href="https://beam.apache.org/documentation/patterns/bigqueryio/";>https://beam.apache.org/documentation/patterns/bigqueryio/</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Update <strong>I/O Connectors</strong> with your I/O’s information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors";>https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors</a>
+         <p><img src="/images/io-standards/io-supported-via-screenshot.png" 
width="" alt="alt_text" title="image_tooltip">
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Provide setup steps to use the I/O, under a <strong>Before you 
start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start";>https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Include a canonical read/write code snippet after the initial 
description for each supported language. The below example shows Hadoop with 
examples for Java.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformatio";>https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformation</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps for elements are assigned. This includes 
batch sources to allow for future I/Os which may provide more useful 
information than current_time().
+         <p>Example:
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps are advanced; for Batch sources this will 
be marked as n/a in most cases.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Outline any temporary resources (for example, files) that the 
connector will create.
+         <p>Example:
+         <p>BigQuery batch loads first create a temp GCS location
+         <p><a 
href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455";>https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Provide, under an <strong>Authentication</strong> subheader, how 
to acquire partner authorization material to securely access the source/sink.
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#authentication";>https://beam.apache.org/documentation/io/built-in/snowflake/#authentication</a>
+         <p>Here BigQuery names it permissions but the topic covers 
similarities
+         <p><a 
href="https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html";>https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>I/Os should provide links to the Source/Sink documentation within 
<strong>Before you start Header</strong>
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/";>https://beam.apache.org/documentation/io/built-in/snowflake/</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Indicate if there is native or X-language support in each language 
with a link to the docs.
+         <p>Example:
+         <p>Kinesis I/O has a native implementation of java and X-language 
support for python but no support for Golang.
+      </td>
+  </tr>
+  <tr>
+   <td>
+      <p>Indicate known limitations under a <strong>Limitations</strong> 
header. If the limitation has a tracking issue, please link it inline.
+      <p>Example:
+      <p><a 
href="https://beam.apache.org/documentation/io/built-in/snowflake/#limitations";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+   </td>
+  </tr>
+</table>
+</div>
+
+
+
+### I/O (not built-in)
+
+Custom I/Os are not included in the Apache Beam Github repository. Some 
examples would be 
[Solace](https://github.com/SolaceProducts/solace-apache-beam)IO.
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-connectors">
+   <tr>
+      <td>
+         <p>Update I/O connectors with your I/O information
+         <p>Example:
+         <p><a 
href="https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam";>https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+## Development
+
+This section outlines API Syntax, Semantics and Feature Adoption 
recommendations for new and existing Apache Beam I/O Connectors.
+
+Development guidelines are written with the following principles in mind:
+
+
+
+* Consistency makes an API easier to learn
+    * If there are multiple ways of doing something, we should strive to be 
consistent first
+* With a couple minutes of studying documentation, users should be able to 
pick up most I/O connectors

Review Comment:
   added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to