Repository: beam-site Updated Branches: refs/heads/asf-site cb6d7d77e -> b5748765f
Add Pipeline I/O section to website - outline + move some existing content * I did not to go with a single page for all this content b/c both java and python have enough unique content that they deserve their own separate sections (ie, just tabs on the code isn't enough), and the "click to the next page" model currently implemented allows the user to pick java vs python, but then after reading those pages, the next page for both points at the same place - the users mostly follow the same path, but for java vs python specific content, they will diverge then converge again. * I moved the "list of built-in I/O" content over to it's own separate page since it'd be nice to have more content there - e.g. capabilities matrix, and it felt special enough to pull out of the programming guide. * We decided not to put all of this content in the contribute section of the site since the expectation is we don't think all users will contribute their IO transforms, so we want most of the docs to just be about writing an IO transforms, and they lay out the expectations in the contribute part of the IO section. Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/f2171885 Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/f2171885 Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/f2171885 Branch: refs/heads/asf-site Commit: f21718850c645c83767f9787d335964da142fda9 Parents: cb6d7d7 Author: Stephen Sisk <[email protected]> Authored: Wed Mar 8 17:49:37 2017 -0800 Committer: Davor Bonaci <[email protected]> Committed: Fri Mar 17 18:33:43 2017 -0700 ---------------------------------------------------------------------- src/_includes/header.html | 1 + src/documentation/io/authoring-java.md | 15 ++++++ src/documentation/io/authoring-overview.md | 44 ++++++++++++++++++ src/documentation/io/authoring-python.md | 18 ++++++++ src/documentation/io/built-in.md | 61 +++++++++++++++++++++++++ src/documentation/io/contributing.md | 15 ++++++ src/documentation/io/io-toc.md | 26 +++++++++++ src/documentation/io/testing.md | 19 ++++++++ src/documentation/programming-guide.md | 54 ++-------------------- src/documentation/sdks/java.md | 21 +-------- 10 files changed, 204 insertions(+), 70 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/_includes/header.html ---------------------------------------------------------------------- diff --git a/src/_includes/header.html b/src/_includes/header.html index 28000d8..1ea3496 100644 --- a/src/_includes/header.html +++ b/src/_includes/header.html @@ -42,6 +42,7 @@ <li><a href="{{ site.baseurl }}/documentation/pipelines/design-your-pipeline/">Design Your Pipeline</a></li> <li><a href="{{ site.baseurl }}/documentation/pipelines/create-your-pipeline/">Create Your Pipeline</a></li> <li><a href="{{ site.baseurl }}/documentation/pipelines/test-your-pipeline/">Test Your Pipeline</a></li> + <li><a href="{{ site.baseurl }}/documentation/io/io-toc/">Pipeline I/O</a></li> <li role="separator" class="divider"></li> <li class="dropdown-header">SDKs</li> <li><a href="{{ site.baseurl }}/documentation/sdks/java/">Java SDK</a></li> http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/authoring-java.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/authoring-java.md b/src/documentation/io/authoring-java.md new file mode 100644 index 0000000..6cdb6bd --- /dev/null +++ b/src/documentation/io/authoring-java.md @@ -0,0 +1,15 @@ +--- +layout: default +title: "Authoring I/O Transforms - Java" +permalink: /documentation/io/authoring-java/ +--- + +[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/) + +# Authoring I/O Transforms - Java + +> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025). + +# Next steps + +[Testing I/O Transforms]({{site.baseurl }}/documentation/io/testing/) http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/authoring-overview.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/authoring-overview.md b/src/documentation/io/authoring-overview.md new file mode 100644 index 0000000..dab6a85 --- /dev/null +++ b/src/documentation/io/authoring-overview.md @@ -0,0 +1,44 @@ +--- +layout: default +title: "Authoring I/O Transforms - Overview" +permalink: /documentation/io/authoring-overview/ +--- + +[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/) + +# Authoring I/O Transforms - Overview + +_A guide for users who need to connect to a data store that isn't supported by the [Built-in I/O Transforms]({{site.baseurl }}/documentation/io/built-in/)_ + +> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025). + +* TOC +{:toc} + +## Introduction +TODO + +## Example I/O Transforms +TODO + +## Suggested steps for implementers +TODO + +## Read transforms +TODO + +### When to implement using the Source API +TODO + +## Write transforms +TODO + +### When to implement using the Sink API +TODO + +# Next steps + +For more details on actual implementation, continue with one of the the language specific guides: + +* [Authoring I/O Transforms - Python]({{site.baseurl }}/documentation/io/authoring-python/) +* [Authoring I/O Transforms - Java]({{site.baseurl }}/documentation/io/authoring-java/) http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/authoring-python.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/authoring-python.md b/src/documentation/io/authoring-python.md new file mode 100644 index 0000000..b6ccc56 --- /dev/null +++ b/src/documentation/io/authoring-python.md @@ -0,0 +1,18 @@ +--- +layout: default +title: "Authoring I/O Transforms - Python" +permalink: /documentation/io/authoring-python/ +--- + +[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/) + +# Authoring I/O Transforms - Python + +> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025). + +TODO - move in the [current python SDK content]({{site.baseurl}}/documentation/sdks/python-custom-io/) + + +# Next steps + +[Testing I/O Transforms]({{site.baseurl}}/documentation/io/testing/) http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/built-in.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/built-in.md b/src/documentation/io/built-in.md new file mode 100644 index 0000000..9f96968 --- /dev/null +++ b/src/documentation/io/built-in.md @@ -0,0 +1,61 @@ +--- +layout: default +title: "Built-in I/O Transforms" +permalink: /documentation/io/built-in/ +--- + +[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/) + +# Built-in I/O Transforms + +This table contains the currently available I/O transforms. + +Consult the [Programming Guide I/O section]({{site.baseurl }}/documentation/programming-guide#io) for general usage instructions, and see the javadoc/pydoc for the particular I/O transforms. + + +<table class="table table-bordered"> +<tr> + <th>Language</th> + <th>File-based</th> + <th>Messaging</th> + <th>Database</th> +</tr> +<tr> + <td>Java</td> + <td> + <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java">AvroIO</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hdfs">Apache Hadoop HDFS</a></p> + <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java">TextIO</a></p> + <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/">XML</a></p> + </td> + <td> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jms">JMS</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kafka">Apache Kafka</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kinesis">Amazon Kinesis</a></p> + <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io">Google Cloud PubSub</a></p> + </td> + <td> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hbase">Apache HBase</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/mongodb">MongoDB</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jdbc">JDBC</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery">Google BigQuery</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable">Google Cloud Bigtable</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore">Google Cloud Datastore</a></p> + </td> +</tr> +<tr> + <td>Python</td> + <td> + <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py">avroio</a></p> + <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py">textio</a></p> + </td> + <td> + </td> + <td> + <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py">Google BigQuery</a></p> + <p><a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/gcp/datastore">Google Cloud Datastore</a></p> + </td> + +</tr> +</table> + http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/contributing.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/contributing.md b/src/documentation/io/contributing.md new file mode 100644 index 0000000..949db3c --- /dev/null +++ b/src/documentation/io/contributing.md @@ -0,0 +1,15 @@ +--- +layout: default +title: "Contributing I/O Transforms" +permalink: /documentation/io/contributing/ +--- + +[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/) + +# Contributing I/O Transforms + +* If you are planning to contribute your I/O transform to the Apache Beam community, you'll be going through the normal Beam contribution life cycle - see the [Apache Beam Contribution Guide]({{ site.baseurl }}/contribute/contribution-guide/) for more details. +* Talk to the community! +* Make sure you've implemented the appropriate tests as discussed in the [Testing I/O Transforms]({{site.baseurl }}/documentation/io/testing/) section. + +> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025). http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/io-toc.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/io-toc.md b/src/documentation/io/io-toc.md new file mode 100644 index 0000000..ec6b244 --- /dev/null +++ b/src/documentation/io/io-toc.md @@ -0,0 +1,26 @@ +--- +layout: default +title: "Pipeline I/O" +permalink: /documentation/io/io-toc/ +--- + +# Pipeline I/O + +## Using Pipeline I/O +* [Programming Guide: Using I/O Transforms]({{site.baseurl }}/documentation/programming-guide#io) +* [Built-in I/O Transforms]({{site.baseurl }}/documentation/io/built-in/) + + +## Authoring Read & Write I/O Transforms + +> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025). + +<!-- TODO: commented out until this content is ready. + +This series of articles will walk you through the process of creating a new I/O transform. + +* [Authoring I/O Transforms - Overview]({{site.baseurl }}/documentation/io/authoring-overview/) +* [Authoring I/O Transforms - Python]({{site.baseurl }}/documentation/io/authoring-python/) +* [Authoring I/O Transforms - Java]({{site.baseurl }}/documentation/io/authoring-java/) +* [Testing I/O Transforms]({{site.baseurl }}/documentation/io/testing/) +* [Contributing I/O Transforms]({{site.baseurl }}/documentation/io/contributing/) --> http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/io/testing.md ---------------------------------------------------------------------- diff --git a/src/documentation/io/testing.md b/src/documentation/io/testing.md new file mode 100644 index 0000000..e43c628 --- /dev/null +++ b/src/documentation/io/testing.md @@ -0,0 +1,19 @@ +--- +layout: default +title: "Testing I/O Transforms" +permalink: /documentation/io/testing/ +--- + +[Pipeline I/O Table of Contents]({{site.baseurl}}/documentation/io/io-toc/) + +# Testing I/O Transforms + +> Note: This guide is still in progress. There is an open issue to finish the guide: [BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025). + + +# Next steps + +If you have a well tested I/O transform, why not contribute it to Apache Beam? Read all about it: + +[Contributing I/O Transforms]({{site.baseurl }}/documentation/io/contributing/) + http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/programming-guide.md ---------------------------------------------------------------------- diff --git a/src/documentation/programming-guide.md b/src/documentation/programming-guide.md index 65a3062..57b49e8 100644 --- a/src/documentation/programming-guide.md +++ b/src/documentation/programming-guide.md @@ -921,9 +921,8 @@ While `ParDo` always produces a main output `PCollection` (as the return value f ## <a name="io"></a>Pipeline I/O -When you create a pipeline, you often need to read data from some external source, such as a file in external data sink or a database. Likewise, you may want your pipeline to output its result data to a similar external data sink. Beam provides read and write transforms for a number of common data storage types. If you want your pipeline to read from or write to a data storage format that isn't supported by the built-in transforms, you can implement your own read and write transforms. +When you create a pipeline, you often need to read data from some external source, such as a file in external data sink or a database. Likewise, you may want your pipeline to output its result data to a similar external data sink. Beam provides read and write transforms for a [number of common data storage types]({{site.baseurl }}/documentation/io/built-in/). If you want your pipeline to read from or write to a data storage format that isn't supported by the built-in transforms, you can [implement your own read and write transforms]({{site.baseurl }}/documentation/io/io-toc/). -> A guide that covers how to implement your own Beam IO transforms is in progress ([BEAM-1025](https://issues.apache.org/jira/browse/BEAM-1025)). ### Reading input data @@ -988,55 +987,8 @@ records.apply("WriteToText", %} ``` -### Beam-provided I/O APIs - -See the language specific source code directories for the Beam supported I/O APIs. Specific documentation for each of these I/O sources will be added in the future. ([BEAM-1054](https://issues.apache.org/jira/browse/BEAM-1054)) - -<table class="table table-bordered"> -<tr> - <th>Language</th> - <th>File-based</th> - <th>Messaging</th> - <th>Database</th> -</tr> -<tr> - <td>Java</td> - <td> - <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java">AvroIO</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hdfs">HDFS</a></p> - <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java">TextIO</a></p> - <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/">XML</a></p> - </td> - <td> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jms">JMS</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kafka">Kafka</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/kinesis">Kinesis</a></p> - <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io">Google Cloud PubSub</a></p> - </td> - <td> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hbase">Apache HBase</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/mongodb">MongoDB</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/jdbc">JDBC</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery">Google BigQuery</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable">Google Cloud Bigtable</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore">Google Cloud Datastore</a></p> - </td> -</tr> -<tr> - <td>Python</td> - <td> - <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/avroio.py">avroio</a></p> - <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/textio.py">textio</a></p> - </td> - <td> - </td> - <td> - <p><a href="https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py">Google BigQuery</a></p> - <p><a href="https://github.com/apache/beam/tree/master/sdks/python/apache_beam/io/gcp/datastore">Google Cloud Datastore</a></p> - </td> - -</tr> -</table> +### Beam-provided I/O Transforms +See the [Beam-provided I/O Transforms]({{site.baseurl }}/documentation/io/built-in/) page for a list of the currently available I/O transforms. ## <a name="running"></a>Running the pipeline http://git-wip-us.apache.org/repos/asf/beam-site/blob/f2171885/src/documentation/sdks/java.md ---------------------------------------------------------------------- diff --git a/src/documentation/sdks/java.md b/src/documentation/sdks/java.md index 1a3d856..474dc93 100644 --- a/src/documentation/sdks/java.md +++ b/src/documentation/sdks/java.md @@ -21,22 +21,5 @@ See the [Java API Reference]({{ site.baseurl }}/documentation/sdks/javadoc/) for The Java SDK supports all features currently supported by the Beam model. -## Supported IO Connectors - -* Amazon Kinesis -* Apache Hadoop's `FileInputFormat` in Hadoop Distributed File System (HDFS) -* Apache HBase -* Apache Kafka -* Avro Files -* Google BigQuery -* Google Cloud Bigtable -* Google Cloud Datastore -* Google Cloud Pub/Sub -* Google Cloud Storage -* Java Database Connectivity (JDBC) -* Java Message Service (JMS) -* MongoDB -* Text Files -* XML Files - - +## Pipeline I/O +See the [Beam-provided I/O Transforms]({{site.baseurl }}/documentation/io/built-in/) page for a list of the currently available I/O transforms.
