This is an automated email from the ASF dual-hosted git repository.
tvalentyn pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new d38c2d77e7f BEAM-13582 Fixing broken links in the documentation
(#17300)
d38c2d77e7f is described below
commit d38c2d77e7f23711f8964028bc7210fc087136d4
Author: rszper <[email protected]>
AuthorDate: Thu Apr 21 04:36:05 2022 -0700
BEAM-13582 Fixing broken links in the documentation (#17300)
---
CHANGES.md | 2 +-
website/www/site/content/en/blog/beam-2.29.0.md | 2 +-
website/www/site/content/en/blog/beam-a-look-back.md | 2 +-
website/www/site/content/en/blog/beam-summit-digital-2020.md | 4 ++--
website/www/site/content/en/blog/beam-summit-europe-2019.md | 2 +-
.../site/content/en/blog/review-input-streaming-connectors.md | 10 +++++-----
website/www/site/content/en/contribute/become-a-committer.md | 4 ++--
website/www/site/content/en/contribute/release-guide.md | 2 +-
.../content/en/documentation/io/built-in/google-bigquery.md | 2 +-
website/www/site/content/en/documentation/io/testing.md | 4 ++--
website/www/site/content/en/documentation/programming-guide.md | 2 +-
.../content/en/documentation/resources/learning-resources.md | 3 +--
website/www/site/content/en/documentation/runners/direct.md | 2 +-
website/www/site/content/en/documentation/runners/jstorm.md | 4 +---
.../www/site/content/en/documentation/runtime/environments.md | 2 +-
15 files changed, 22 insertions(+), 25 deletions(-)
diff --git a/CHANGES.md b/CHANGES.md
index 8168402c859..8e70a27a1aa 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -441,7 +441,7 @@
## New Features / Improvements
* DataFrame API now supports pandas 1.2.x
([BEAM-11531](https://issues.apache.org/jira/browse/BEAM-11531)).
-* Multiple DataFrame API bugfixes
([BEAM-12071](https://issues.apache/jira/browse/BEAM-12071),
[BEAM-11929](https://issues.apache/jira/browse/BEAM-11929))
+* Multiple DataFrame API bugfixes
([BEAM-12071](https://issues.apache.org/jira/browse/BEAM-12071),
[BEAM-11929](https://issues.apache.org/jira/browse/BEAM-11929))
## Breaking Changes
diff --git a/website/www/site/content/en/blog/beam-2.29.0.md
b/website/www/site/content/en/blog/beam-2.29.0.md
index e2b3e1c9694..4bf577d77b1 100644
--- a/website/www/site/content/en/blog/beam-2.29.0.md
+++ b/website/www/site/content/en/blog/beam-2.29.0.md
@@ -42,7 +42,7 @@ For more information on changes in 2.29.0, check out the
[detailed release notes
### New Features / Improvements
* DataFrame API now supports pandas 1.2.x
([BEAM-11531](https://issues.apache.org/jira/browse/BEAM-11531)).
-* Multiple DataFrame API bugfixes
([BEAM-12071](https://issues.apache/jira/browse/BEAM-12071),
[BEAM-11929](https://issues.apache/jira/browse/BEAM-11929))
+* Multiple DataFrame API bugfixes
([BEAM-12071](https://issues.apache.org/jira/browse/BEAM-12071),
[BEAM-11929](https://issues.apache.org/jira/browse/BEAM-11929))
* DDL supported in SQL transforms
([BEAM-11850](https://issues.apache.org/jira/browse/BEAM-11850))
* Upgrade Flink runner to Flink version 1.12.2
([BEAM-11941](https://issues.apache.org/jira/browse/BEAM-11941))
diff --git a/website/www/site/content/en/blog/beam-a-look-back.md
b/website/www/site/content/en/blog/beam-a-look-back.md
index 87d1bc217ee..221272e94d1 100644
--- a/website/www/site/content/en/blog/beam-a-look-back.md
+++ b/website/www/site/content/en/blog/beam-a-look-back.md
@@ -61,7 +61,7 @@ new and updated runners were developed:
- Apache Spark 2.x update
- [IBM Streams
runner](https://www.ibm.com/blogs/bluemix/2017/10/streaming-analytics-updates-ibm-streams-runner-apache-beam-2-0/)
- MapReduce runner
- - [JStorm runner](http://jstorm.io/)
+ - [JStorm runner](https://github.com/alibaba/jstorm)
In addition to runners, Beam added new IO connectors, some notable ones being
the Cassandra, MQTT, AMQP, HBase/HCatalog, JDBC, Solr, Tika, Redis, and
diff --git a/website/www/site/content/en/blog/beam-summit-digital-2020.md
b/website/www/site/content/en/blog/beam-summit-digital-2020.md
index b5e5333f6c7..903ee5a4410 100644
--- a/website/www/site/content/en/blog/beam-summit-digital-2020.md
+++ b/website/www/site/content/en/blog/beam-summit-digital-2020.md
@@ -45,8 +45,8 @@ As all things Beam, this is a community effort. The door is
open for participati
1. Submit a proposal to talk. Please check out the **[Call for
Papers](https://sessionize.com/beam-digital-summit-2020/)** and submit a talk.
The deadline for submissions is _June 15th_!
2. Register to join as an attendee. Registration is now open at the
**[registration page](https://crowdcast.io/e/beamsummit)**. Registration is
free!
-3. Consider sponsoring the event. If your company is interested in engaging
with members of the community please check out our [sponsoring
prospectus](https://drive.google.com/open?id=1EbijvZKpkWwWyMryLY9sJfyZzZk1k44v).
-4. Help us get the word out. Please make sure to let your colleagues and
friends in the data engineering field (and beyond!) know about the Beam Summit.
+<!--- 3. Consider sponsoring the event. If your company is interested in
engaging with members of the community please check out our sponsoring
prospectus.--->
+3. Help us get the word out. Please make sure to let your colleagues and
friends in the data engineering field (and beyond!) know about the Beam Summit.
## Follow up and more information
diff --git a/website/www/site/content/en/blog/beam-summit-europe-2019.md
b/website/www/site/content/en/blog/beam-summit-europe-2019.md
index 0d303b202c0..7f5351f9efd 100644
--- a/website/www/site/content/en/blog/beam-summit-europe-2019.md
+++ b/website/www/site/content/en/blog/beam-summit-europe-2019.md
@@ -56,7 +56,7 @@ Keep an eye out for a meetup in
[Paris](https://www.meetup.com/Paris-Apache-Beam
If you are interested in starting your own meetup, feel free [to reach
out](https://beam.apache.org/community/contact-us)! Good places to start
include our Slack channel, the dev and user mailing lists, or the Apache Beam
Twitter.
-Even if you can’t travel to these meetups, you can stay informed on the
happenings of the community. The talks and sessions from previous conferences
and meetups are archived on the [Apache Beam YouTube
channel](https://www.youtube.com/c/ApacheBeamYT). If you want your session
added to the channel, don’t hesitate to get in touch! And in case you want to
attend the next Beam event in style, you can also order your swag on the [Beam
swag store](https://store-beam.myshopify.com)
+Even if you can’t travel to these meetups, you can stay informed on the
happenings of the community. The talks and sessions from previous conferences
and meetups are archived on the [Apache Beam YouTube
channel](https://www.youtube.com/c/ApacheBeamYT). If you want your session
added to the channel, don’t hesitate to get in touch!
## Summits
The first summit of the year will be held in Berlin:
diff --git
a/website/www/site/content/en/blog/review-input-streaming-connectors.md
b/website/www/site/content/en/blog/review-input-streaming-connectors.md
index 7c4f7a912c7..9cafe96747f 100644
--- a/website/www/site/content/en/blog/review-input-streaming-connectors.md
+++ b/website/www/site/content/en/blog/review-input-streaming-connectors.md
@@ -127,7 +127,7 @@ and <a
href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre
Beam has an official [Python SDK](/documentation/sdks/python/) that currently
supports a subset of the streaming features available in the Java SDK. Active
development is underway to bridge the gap between the featuresets in the two
SDKs. Currently for Python, the [Direct Runner](/documentation/runners/direct/)
and [Dataflow Runner](/documentation/runners/dataflow/) are supported, and
[several streaming options](/documentation/sdks/python-streaming/) were
introduced in beta in [version 2 [...]
-Spark also has a Python SDK called
[PySpark](https://spark.apache.org/docs/latest/api/python/pyspark.html). As
mentioned earlier, Scala code compiles to a bytecode that is executed by the
JVM. PySpark uses [Py4J](https://www.py4j.org/), a library that enables Python
programs to interact with the JVM and therefore access Java libraries, interact
with Java objects, and register callbacks from Java. This allows PySpark to
access native Spark objects like RDDs. Spark Structured Streaming sup [...]
+Spark also has a Python SDK called
[PySpark](https://spark.apache.org/docs/latest/api/python/index.html). As
mentioned earlier, Scala code compiles to a bytecode that is executed by the
JVM. PySpark uses [Py4J](https://www.py4j.org/), a library that enables Python
programs to interact with the JVM and therefore access Java libraries, interact
with Java objects, and register callbacks from Java. This allows PySpark to
access native Spark objects like RDDs. Spark Structured Streaming suppo [...]
Below are the main streaming input connectors for available for Beam and Spark
DStreams in Python:
@@ -149,7 +149,7 @@ Below are the main streaming input connectors for available
for Beam and Spark D
</td>
<td><a href="https://beam.apache.org/releases/pydoc/{{< param
release_latest >}}/apache_beam.io.textio.html">io.textio</a>
</td>
- <td><a
href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
+ <td><a
href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.streaming.StreamingContext.textFileStream.html">textFileStream</a>
</td>
</tr>
<tr>
@@ -158,7 +158,7 @@ Below are the main streaming input connectors for available
for Beam and Spark D
<td><a href="https://beam.apache.org/releases/pydoc/{{< param
release_latest >}}/apache_beam.io.hadoopfilesystem.html">io.hadoopfilesystem</a>
</td>
<td><a
href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a>
(Access through <code>sc._jsc</code> with Py4J)
-and <a
href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a>
+and <a
href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.streaming.StreamingContext.textFileStream.html">textFileStream</a>
</td>
</tr>
<tr>
@@ -184,7 +184,7 @@ and <a
href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.h
</td>
<td>N/A
</td>
- <td><a
href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils">KafkaUtils</a>
+ <td><a
href="https://spark.apache.org/docs/2.4.8/api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils">KafkaUtils</a>
</td>
</tr>
<tr>
@@ -192,7 +192,7 @@ and <a
href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.h
</td>
<td>N/A
</td>
- <td><a
href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#module-pyspark.streaming.kinesis">KinesisUtils</a>
+ <td><a
href="https://spark.apache.org/docs/2.4.8/api/python/pyspark.streaming.html#pyspark.streaming.kinesis.KinesisUtils">KinesisUtils</a>
</td>
</tr>
<tr>
diff --git a/website/www/site/content/en/contribute/become-a-committer.md
b/website/www/site/content/en/contribute/become-a-committer.md
index 126010fb7b6..e7a78560430 100644
--- a/website/www/site/content/en/contribute/become-a-committer.md
+++ b/website/www/site/content/en/contribute/become-a-committer.md
@@ -39,8 +39,8 @@ makes someone a committer via nomination, discussion, and
then majority vote.
We use data from as many sources as possible to inform our reasoning. Here are
some examples:
- - [dev@ archives](https://lists.apache.org/[email protected])
and [statistics](https://lists.apache.org/[email protected])
- - [user@ archives](https://lists.apache.org/[email protected])
and [statistics](https://lists.apache.org/[email protected])
+ - [dev@ archives](https://lists.apache.org/[email protected])
+ - [user@ archives](https://lists.apache.org/[email protected])
- [`apache-beam` StackOverflow
tag](https://stackoverflow.com/questions/tagged/apache-beam)
- Git metrics for [Beam](https://github.com/apache/beam/graphs/contributors)
- Code reviews given and received on
diff --git a/website/www/site/content/en/contribute/release-guide.md
b/website/www/site/content/en/contribute/release-guide.md
index d47766638e0..1570306aba9 100644
--- a/website/www/site/content/en/contribute/release-guide.md
+++ b/website/www/site/content/en/contribute/release-guide.md
@@ -584,7 +584,7 @@ See the source of the script for more details, or to run
commands manually in ca
1. Select repository `orgapachebeam-NNNN`.
1. Click the Close button.
1. When prompted for a description, enter “Apache Beam, version X,
release candidate Y”.
- 1. Review all staged artifacts on
https://repository.apache.org/content/repositories/orgapachebeam-NNNN/.
+ 1. Review all staged artifacts on
`https://repository.apache.org/content/repositories/orgapachebeam-NNNN/`.
They should contain all relevant parts for each module, including
`pom.xml`, jar, test jar, javadoc, etc.
Artifact names should follow [the existing
format](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22)
in which artifact name mirrors directory structure, e.g.,
`beam-sdks-java-io-kafka`.
Carefully review any new artifacts.
diff --git
a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
index c957f5e5ed6..1759016f706 100644
--- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
+++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
@@ -92,7 +92,7 @@ a string, or use a
[TableReference](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/index.html?com/google/api/services/bigquery/model/TableReference.html)
</span>
<span class="language-py">
-
[TableReference](https://github.com/googleapis/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/table.py#L153)
+
[TableReference](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html#table-references)
</span>
object.
diff --git a/website/www/site/content/en/documentation/io/testing.md
b/website/www/site/content/en/documentation/io/testing.md
index 0f23bddd6e3..d7a17adf398 100644
--- a/website/www/site/content/en/documentation/io/testing.md
+++ b/website/www/site/content/en/documentation/io/testing.md
@@ -172,7 +172,7 @@ Example usage on Cloud Dataflow runner:
Example usage on HDFS filesystem and Direct runner:
-NOTE: Below setup will only work when /etc/hosts file contains entries with
hadoop namenode and hadoop datanodes external IPs. Please see explanation in:
[Small Cluster config
file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/SmallITCluster/pkb-config.yml)
and [Large Cluster config
file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/LargeITCluster/pkb-config.yml).
+NOTE: Below setup will only work when /etc/hosts file contains entries with
hadoop namenode and hadoop datanodes external IPs. Please see explanation in:
[Small Cluster config
file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/SmallITCluster/hdfs-single-datanode-cluster.yml)
and [Large Cluster config
file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml).
```
export HADOOP_USER_NAME=root
@@ -334,7 +334,7 @@ If you modified/added new Jenkins job definitions in your
Pull Request, run the
As mentioned before, we measure the performance of IOITs by gathering test
execution times from Jenkins jobs that run periodically. The consequent results
are stored in a database (BigQuery), therefore we can display them in a form of
plots.
-The dashboard gathering all the results is available here: [Performance
Testing Dashboard](https://s.apache.org/io-test-dashboards)
+The dashboard gathering all the results is available here: [Performance
Testing
Dashboard](http://metrics.beam.apache.org/d/1/getting-started?orgId=1&viewPanel=123125)
### Implementing Integration Tests {#implementing-integration-tests}
diff --git a/website/www/site/content/en/documentation/programming-guide.md
b/website/www/site/content/en/documentation/programming-guide.md
index d2f06417e8e..998be6fcaeb 100644
--- a/website/www/site/content/en/documentation/programming-guide.md
+++ b/website/www/site/content/en/documentation/programming-guide.md
@@ -3971,7 +3971,7 @@ Standard Go types like `int`, `int64` `float64`,
`[]byte`, and `string` and more
Structs and pointers to structs default using Beam Schema Row encoding.
However, users can build and register custom coders with `beam.RegisterCoder`.
You can find available Coder functions in the
-[coder](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coders)
+[coder](https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder)
package.
{{< /paragraph >}}
diff --git
a/website/www/site/content/en/documentation/resources/learning-resources.md
b/website/www/site/content/en/documentation/resources/learning-resources.md
index 8da964e0206..8e7cedd75f1 100644
--- a/website/www/site/content/en/documentation/resources/learning-resources.md
+++ b/website/www/site/content/en/documentation/resources/learning-resources.md
@@ -97,8 +97,7 @@ If you have additional material that you would like to see
here, please let us k
### Python
-* **[Python Qwik
Start](https://qwiklabs.com/focuses/1100?locale=en&parent=catalog)** (30m) -
Run a word count pipeline on the Dataflow runner.
-* **[NDVI from Landsat
Images](https://qwiklabs.com/focuses/1849?locale=en&parent=catalog)** (45m) -
Process Landsat satellite data in a distributed environment to compute the
[Normalized Difference Vegetation
Index](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index)
(NDVI).
+* **[Python Qwik
Start](https://www.qwiklabs.com/focuses/1098?parent=catalog)** (30m) - Run a
word count pipeline on the Dataflow runner.
* **[Simulate historic
flights](https://qwiklabs.com/focuses/1159?locale=en&parent=catalog)** (60m) -
Simulate real-time historic internal flights in the United States and store the
resulting simulated data in BigQuery.
## Beam Katas {#beam-katas}
diff --git a/website/www/site/content/en/documentation/runners/direct.md
b/website/www/site/content/en/documentation/runners/direct.md
index 1249aa9a286..24acdf0bce3 100644
--- a/website/www/site/content/en/documentation/runners/direct.md
+++ b/website/www/site/content/en/documentation/runners/direct.md
@@ -36,7 +36,7 @@ Here are some resources with information about how to test
your pipelines.
<li class="language-java">The <a
href="/get-started/wordcount-example/#testing-your-pipeline-with-asserts">Apache
Beam WordCount Walkthrough</a> contains an example of logging and testing a
pipeline with <a href="https://beam.apache.org/releases/javadoc/{{< param
release_latest
>}}/index.html?org/apache/beam/sdk/testing/PAssert.html">PAssert</a>.</li>
<!-- Python specific links -->
- <li class="language-py">The <a
href="/get-started/wordcount-example/#testing-your-pipeline-with-asserts">Apache
Beam WordCount Walkthrough</a> contains an example of logging and testing a
pipeline with <a href="https://beam.apache.org/releases/pydoc/{{< param
release_latest
>}}/apache_beam.testing.util.html#apache_beam.testing.util.assert_that">assert_that</a>.</li>
+ <li class="language-py">The <a
href="/get-started/wordcount-example/#testing-your-pipeline-with-asserts">Apache
Beam WordCount Walkthrough</a> contains an example of logging and testing a
pipeline with <code>assert_that</code>.</li>
</ul>
## Direct Runner prerequisites and setup
diff --git a/website/www/site/content/en/documentation/runners/jstorm.md
b/website/www/site/content/en/documentation/runners/jstorm.md
index cbf477d6127..6cbf00a8aa9 100644
--- a/website/www/site/content/en/documentation/runners/jstorm.md
+++ b/website/www/site/content/en/documentation/runners/jstorm.md
@@ -16,7 +16,7 @@ limitations under the License.
-->
# Using the JStorm Runner
-The JStorm Runner can be used to execute Beam pipelines using
[JStorm](http://jstorm.io/), while providing:
+The JStorm Runner can be used to execute Beam pipelines using
[JStorm](https://github.com/alibaba/jstorm), while providing:
* High throughput and low latency.
* At-least-once and exactly-once fault tolerance.
@@ -52,8 +52,6 @@ When you submit a topology with argument `"--external-libs
beam"`, JStorm will l
jstorm jar WordCount.jar org.apache.beam.examples.WordCount --external-libs
beam --runner=org.apache.beam.runners.jstorm.JStormRunner
```
-To learn about deploying a JStorm cluster, please refer to [JStorm cluster
deploy](http://jstorm.io/QuickStart/Deploy/index.html)
-
## Pipeline options for the JStorm Runner
When executing your pipeline with the JStorm Runner, you should consider the
following pipeline options.
diff --git a/website/www/site/content/en/documentation/runtime/environments.md
b/website/www/site/content/en/documentation/runtime/environments.md
index aff23e2d3f4..2243bbe636b 100644
--- a/website/www/site/content/en/documentation/runtime/environments.md
+++ b/website/www/site/content/en/documentation/runtime/environments.md
@@ -102,7 +102,7 @@ This method requires building image artifacts from Beam
source. For additional i
git checkout origin/release-$BEAM_SDK_VERSION
```
-2. Customize the `Dockerfile` for a given language, typically
`sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for
Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
If you're adding dependencies from [PyPI](https://pypi.org/), use
[`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt)
instead.
+2. Customize the `Dockerfile` for a given language, typically
`sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for
Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile).
3. Return to the root Beam directory and run the Gradle `docker` target for
your image.