This is an automated email from the ASF dual-hosted git repository.
yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 07f1341fac6f [MINOR][DOCS] Fix various broken links and link anchors
07f1341fac6f is described below
commit 07f1341fac6ff23f1c5a30c87a0d9946ca794e61
Author: Nicholas Chammas <[email protected]>
AuthorDate: Mon Feb 5 11:00:16 2024 +0800
[MINOR][DOCS] Fix various broken links and link anchors
### What changes were proposed in this pull request?
Fix various broken links and link anchors.
### Why are the changes needed?
Broken is broken, savvy?
### Does this PR introduce _any_ user-facing change?
Yes, it fixes broken links in user-facing documentation so they're no
longer broken.
### How was this patch tested?
I built the docs and clicked on some of the fixed links to confirm they are
no longer broken.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #45022 from nchammas/more-broken-links.
Lead-authored-by: Nicholas Chammas <[email protected]>
Co-authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
---
docs/_data/menu-mllib.yaml | 2 +-
docs/mllib-clustering.md | 4 ++--
docs/mllib-ensembles.md | 4 ++--
docs/mllib-guide.md | 5 ++---
docs/mllib-linear-methods.md | 7 +++----
docs/spark-standalone.md | 3 +--
docs/sparkr.md | 2 +-
docs/sql-data-sources-avro.md | 2 +-
docs/sql-data-sources.md | 2 +-
docs/sql-ref-syntax-aux-show-partitions.md | 2 +-
docs/streaming-programming-guide.md | 2 +-
docs/submitting-applications.md | 3 +--
docs/tuning.md | 2 +-
python/pyspark/mllib/clustering.py | 2 +-
14 files changed, 19 insertions(+), 23 deletions(-)
diff --git a/docs/_data/menu-mllib.yaml b/docs/_data/menu-mllib.yaml
index 12d22abd5282..0e709e267eb5 100644
--- a/docs/_data/menu-mllib.yaml
+++ b/docs/_data/menu-mllib.yaml
@@ -61,7 +61,7 @@
- text: association rules
url: mllib-frequent-pattern-mining.html#association-rules
- text: PrefixSpan
- url: mllib-frequent-pattern-mining.html#prefix-span
+ url: mllib-frequent-pattern-mining.html#prefixspan
- text: Evaluation metrics
url: mllib-evaluation-metrics.html
- text: PMML model export
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index b0be1d6227b3..fde354457f23 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -458,14 +458,14 @@ This example shows how to estimate clusters on streaming
data.
<div data-lang="python" markdown="1">
Refer to the [`StreamingKMeans` Python
docs](api/python/reference/api/pyspark.mllib.clustering.StreamingKMeans.html)
for more details on the API.
-And Refer to [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing) for details on
StreamingContext.
+And Refer to [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing-streamingcontext) for
details on StreamingContext.
{% include_example python/mllib/streaming_k_means_example.py %}
</div>
<div data-lang="scala" markdown="1">
Refer to the [`StreamingKMeans` Scala
docs](api/scala/org/apache/spark/mllib/clustering/StreamingKMeans.html) for
details on the API.
-And Refer to [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing) for details on
StreamingContext.
+And Refer to [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing-streamingcontext) for
details on StreamingContext.
{% include_example
scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala %}
</div>
diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md
index fdad7ae68dd4..8f4e6b1088b3 100644
--- a/docs/mllib-ensembles.md
+++ b/docs/mllib-ensembles.md
@@ -29,7 +29,7 @@ Both use [decision trees](mllib-decision-tree.html) as their
base models.
## Gradient-Boosted Trees vs. Random Forests
-Both [Gradient-Boosted Trees
(GBTs)](mllib-ensembles.html#Gradient-Boosted-Trees-(GBTS)) and [Random
Forests](mllib-ensembles.html#Random-Forests) are algorithms for learning
ensembles of trees, but the training processes are different. There are
several practical trade-offs:
+Both [Gradient-Boosted Trees
(GBTs)](mllib-ensembles.html#gradient-boosted-trees-gbts) and [Random
Forests](mllib-ensembles.html#random-forests) are algorithms for learning
ensembles of trees, but the training processes are different. There are
several practical trade-offs:
* GBTs train one tree at a time, so they can take longer to train than random
forests. Random Forests can train multiple trees in parallel.
* On the other hand, it is often reasonable to use smaller (shallower)
trees with GBTs than with Random Forests, and training smaller trees takes less
time.
@@ -175,7 +175,7 @@ using both continuous and categorical features.
`spark.mllib` implements GBTs using the existing [decision
tree](mllib-decision-tree.html) implementation. Please see the decision tree
guide for more information on trees.
*Note*: GBTs do not yet support multiclass classification. For multiclass
problems, please use
-[decision trees](mllib-decision-tree.html) or [Random
Forests](mllib-ensembles.html#Random-Forest).
+[decision trees](mllib-decision-tree.html) or [Random
Forests](mllib-ensembles.html#random-forests).
### Basic algorithm
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index dbb74407d030..080f80531a6d 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -44,7 +44,7 @@ which is now the primary API for MLlib.
* [Gaussian mixture](mllib-clustering.html#gaussian-mixture)
* [power iteration clustering
(PIC)](mllib-clustering.html#power-iteration-clustering-pic)
* [latent Dirichlet allocation
(LDA)](mllib-clustering.html#latent-dirichlet-allocation-lda)
- * [bisecting k-means](mllib-clustering.html#bisecting-kmeans)
+ * [bisecting k-means](mllib-clustering.html#bisecting-k-means)
* [streaming k-means](mllib-clustering.html#streaming-k-means)
* [Dimensionality reduction](mllib-dimensionality-reduction.html)
* [singular value decomposition
(SVD)](mllib-dimensionality-reduction.html#singular-value-decomposition-svd)
@@ -53,10 +53,9 @@ which is now the primary API for MLlib.
* [Frequent pattern mining](mllib-frequent-pattern-mining.html)
* [FP-growth](mllib-frequent-pattern-mining.html#fp-growth)
* [association rules](mllib-frequent-pattern-mining.html#association-rules)
- * [PrefixSpan](mllib-frequent-pattern-mining.html#prefix-span)
+ * [PrefixSpan](mllib-frequent-pattern-mining.html#prefixspan)
* [Evaluation metrics](mllib-evaluation-metrics.html)
* [PMML model export](mllib-pmml-model-export.html)
* [Optimization (developer)](mllib-optimization.html)
* [stochastic gradient
descent](mllib-optimization.html#stochastic-gradient-descent-sgd)
* [limited-memory BFGS
(L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs)
-
diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md
index 448d881f794a..f8cd73eb2d0b 100644
--- a/docs/mllib-linear-methods.md
+++ b/docs/mllib-linear-methods.md
@@ -138,7 +138,7 @@ especially when the number of training examples is small.
Under the hood, linear methods use convex optimization methods to optimize the
objective functions.
`spark.mllib` uses two methods, SGD and L-BFGS, described in the [optimization
section](mllib-optimization.html).
Currently, most algorithm APIs support Stochastic Gradient Descent (SGD), and
a few support L-BFGS.
-Refer to [this optimization
section](mllib-optimization.html#Choosing-an-Optimization-Method) for
guidelines on choosing between optimization methods.
+Refer to [this optimization
section](mllib-optimization.html#choosing-an-optimization-method) for
guidelines on choosing between optimization methods.
## Classification
@@ -383,7 +383,7 @@ online to the first stream, and make predictions on the
second stream.
First, we import the necessary classes for parsing our input data and creating
the model.
Then we make input streams for training and testing data. We assume a
StreamingContext `ssc`
-has already been created, see [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing)
+has already been created, see [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing-streamingcontext)
for more info. For this example, we use labeled points in training and testing
streams,
but in practice you will likely want to use unlabeled vectors for test data.
@@ -408,7 +408,7 @@ Here a complete example:
First, we import the necessary classes for parsing our input data and creating
the model.
Then we make input streams for training and testing data. We assume a
StreamingContext `ssc`
-has already been created, see [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing)
+has already been created, see [Spark Streaming Programming
Guide](streaming-programming-guide.html#initializing-streamingcontext)
for more info. For this example, we use labeled points in training and testing
streams,
but in practice you will likely want to use unlabeled vectors for test data.
@@ -456,4 +456,3 @@ Algorithms are all implemented in Scala:
*
[LinearRegressionWithSGD](api/scala/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html)
*
[RidgeRegressionWithSGD](api/scala/org/apache/spark/mllib/regression/RidgeRegressionWithSGD.html)
* [LassoWithSGD](api/scala/org/apache/spark/mllib/regression/LassoWithSGD.html)
-
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index a21d16419fd1..fbc83180d6b6 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -751,8 +751,7 @@ Learn more about getting started with ZooKeeper
[here](https://zookeeper.apache.
**Configuration**
-In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in
spark-env by configuring `spark.deploy.recoveryMode` and related
spark.deploy.zookeeper.* configurations.
-For more information about these configurations please refer to the
[configuration doc](configuration.html#deploy)
+In order to enable this recovery mode, you can set `SPARK_DAEMON_JAVA_OPTS` in
spark-env by configuring `spark.deploy.recoveryMode` and related
`spark.deploy.zookeeper.*` configurations.
Possible gotcha: If you have multiple Masters in your cluster but fail to
correctly configure the Masters to use ZooKeeper, the Masters will fail to
discover each other and think they're all leaders. This will not lead to a
healthy cluster state (as all Masters will schedule independently).
diff --git a/docs/sparkr.md b/docs/sparkr.md
index a34a1200c4c0..ef99ea961c9b 100644
--- a/docs/sparkr.md
+++ b/docs/sparkr.md
@@ -571,7 +571,7 @@ SparkR supports the following machine learning algorithms
currently:
#### Frequent Pattern Mining
* [`spark.fpGrowth`](api/R/reference/spark.fpGrowth.html) :
[`FP-growth`](ml-frequent-pattern-mining.html#fp-growth)
-* [`spark.prefixSpan`](api/R/reference/spark.prefixSpan.html) :
[`PrefixSpan`](ml-frequent-pattern-mining.html#prefixSpan)
+* [`spark.prefixSpan`](api/R/reference/spark.prefixSpan.html) :
[`PrefixSpan`](ml-frequent-pattern-mining.html#prefixspan)
#### Statistics
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index 712d4d3b8cd4..d71789956429 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -274,7 +274,7 @@ Data source options of Avro can be set via:
<tr>
<td><code>ignoreExtension</code></td>
<td>true</td>
- <td>The option controls ignoring of files without <code>.avro</code>
extensions in read.<br> If the option is enabled, all files (with and without
<code>.avro</code> extension) are loaded.<br> The option has been deprecated,
and it will be removed in the future releases. Please use the general data
source option <a
href="./sql-data-sources-generic-options.html#path-global-filter">pathGlobFilter</a>
for filtering file names.</td>
+ <td>The option controls ignoring of files without <code>.avro</code>
extensions in read.<br> If the option is enabled, all files (with and without
<code>.avro</code> extension) are loaded.<br> The option has been deprecated,
and it will be removed in the future releases. Please use the general data
source option <a
href="./sql-data-sources-generic-options.html#path-glob-filter">pathGlobFilter</a>
for filtering file names.</td>
<td>read</td>
<td>2.4.0</td>
</tr>
diff --git a/docs/sql-data-sources.md b/docs/sql-data-sources.md
index a2bedd178abb..8ac819e1285e 100644
--- a/docs/sql-data-sources.md
+++ b/docs/sql-data-sources.md
@@ -36,7 +36,7 @@ goes into specific options that are available for the
built-in data sources.
* [Generic File Source Options](sql-data-sources-generic-options.html)
* [Ignore Corrupt
Files](sql-data-sources-generic-options.html#ignore-corrupt-files)
* [Ignore Missing
Files](sql-data-sources-generic-options.html#ignore-missing-files)
- * [Path Global
Filter](sql-data-sources-generic-options.html#path-global-filter)
+ * [Path Glob Filter](sql-data-sources-generic-options.html#path-glob-filter)
* [Recursive File
Lookup](sql-data-sources-generic-options.html#recursive-file-lookup)
* [Parquet Files](sql-data-sources-parquet.html)
* [Loading Data
Programmatically](sql-data-sources-parquet.html#loading-data-programmatically)
diff --git a/docs/sql-ref-syntax-aux-show-partitions.md
b/docs/sql-ref-syntax-aux-show-partitions.md
index d93825550413..0b2ed3507e29 100644
--- a/docs/sql-ref-syntax-aux-show-partitions.md
+++ b/docs/sql-ref-syntax-aux-show-partitions.md
@@ -105,6 +105,6 @@ SHOW PARTITIONS customer PARTITION (city = 'San Jose');
### Related Statements
* [CREATE TABLE](sql-ref-syntax-ddl-create-table.html)
-* [INSERT STATEMENT](sql-ref-syntax-dml-insert.html)
+* [INSERT STATEMENT](sql-ref-syntax-dml-insert-table.html)
* [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html)
* [SHOW TABLE](sql-ref-syntax-aux-show-table.html)
diff --git a/docs/streaming-programming-guide.md
b/docs/streaming-programming-guide.md
index 96dd5528aac5..21e8fe6e8333 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -295,7 +295,7 @@ The complete code can be found in the Spark Streaming
example
</div>
-If you have already [downloaded](index.html#downloading) and
[built](index.html#building) Spark,
+If you have already [downloaded](index.html#downloading) and
[built](building-spark.html) Spark,
you can run this example as follows. You will first need to run Netcat
(a small utility found in most Unix-like systems) as a data server by using
diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index 61517d5feacd..bf02ec137e20 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -179,8 +179,7 @@ The master URL passed to Spark can be in one of the
following formats:
The `spark-submit` script can load default [Spark configuration
values](configuration.html) from a
properties file and pass them on to your application. By default, it will read
options
-from `conf/spark-defaults.conf` in the Spark directory. For more detail, see
the section on
-[loading default
configurations](configuration.html#loading-default-configurations).
+from `conf/spark-defaults.conf` in the `SPARK_HOME` directory.
Loading default Spark configurations this way can obviate the need for certain
flags to
`spark-submit`. For instance, if the `spark.master` property is set, you can
safely omit the
diff --git a/docs/tuning.md b/docs/tuning.md
index 94fe987175cf..f72dc0efd98e 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -196,7 +196,7 @@ the space allocated to the RDD cache to mitigate this.
**Measuring the Impact of GC**
The first step in GC tuning is to collect statistics on how frequently garbage
collection occurs and the amount of
-time spent GC. This can be done by adding `-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps` to the Java options. (See the [configuration
guide](configuration.html#Dynamically-Loading-Spark-Properties) for info on
passing Java options to Spark jobs.) Next time your Spark job is run, you will
see messages printed in the worker's logs
+time spent GC. This can be done by adding `-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps` to the Java options. (See the [configuration
guide](configuration.html#dynamically-loading-spark-properties) for info on
passing Java options to Spark jobs.) Next time your Spark job is run, you will
see messages printed in the worker's logs
each time a garbage collection occurs. Note these logs will be on your
cluster's worker nodes (in the `stdout` files in
their work directories), *not* on your driver program.
diff --git a/python/pyspark/mllib/clustering.py
b/python/pyspark/mllib/clustering.py
index 4595268edc6c..71f42954decb 100644
--- a/python/pyspark/mllib/clustering.py
+++ b/python/pyspark/mllib/clustering.py
@@ -1130,7 +1130,7 @@ class LDAModel(JavaModelWrapper, JavaSaveable,
Loader["LDAModel"]):
.. [1] Blei, D. et al. "Latent Dirichlet Allocation."
J. Mach. Learn. Res. 3 (2003): 993-1022.
- https://www.jmlr.org/papers/v3/blei03a
+
https://web.archive.org/web/20220128160306/https://www.jmlr.org/papers/v3/blei03a
Examples
--------
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]