(spark) branch master updated: [MINOR][DOCS] Fix various broken links and link anchors

yao Sun, 04 Feb 2024 19:02:53 -0800

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 07f1341fac6f [MINOR][DOCS] Fix various broken links and link anchors
07f1341fac6f is described below

commit 07f1341fac6ff23f1c5a30c87a0d9946ca794e61
Author: Nicholas Chammas <[email protected]>
AuthorDate: Mon Feb 5 11:00:16 2024 +0800

    [MINOR][DOCS] Fix various broken links and link anchors
    
    ### What changes were proposed in this pull request?
    
    Fix various broken links and link anchors.
    
    ### Why are the changes needed?
    
    Broken is broken, savvy?
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it fixes broken links in user-facing documentation so they're no 
longer broken.
    
    ### How was this patch tested?
    
    I built the docs and clicked on some of the fixed links to confirm they are 
no longer broken.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #45022 from nchammas/more-broken-links.
    
    Lead-authored-by: Nicholas Chammas <[email protected]>
    Co-authored-by: Kent Yao <[email protected]>
    Signed-off-by: Kent Yao <[email protected]>
---
 docs/_data/menu-mllib.yaml                 | 2 +-
 docs/mllib-clustering.md                   | 4 ++--
 docs/mllib-ensembles.md                    | 4 ++--
 docs/mllib-guide.md                        | 5 ++---
 docs/mllib-linear-methods.md               | 7 +++----
 docs/spark-standalone.md                   | 3 +--
 docs/sparkr.md                             | 2 +-
 docs/sql-data-sources-avro.md              | 2 +-
 docs/sql-data-sources.md                   | 2 +-
 docs/sql-ref-syntax-aux-show-partitions.md | 2 +-
 docs/streaming-programming-guide.md        | 2 +-
 docs/submitting-applications.md            | 3 +--
 docs/tuning.md                             | 2 +-
 python/pyspark/mllib/clustering.py         | 2 +-
 14 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/docs/_data/menu-mllib.yaml b/docs/_data/menu-mllib.yaml
index 12d22abd5282..0e709e267eb5 100644
--- a/docs/_data/menu-mllib.yaml
+++ b/docs/_data/menu-mllib.yaml
@@ -61,7 +61,7 @@
     - text: association rules
       url: mllib-frequent-pattern-mining.html#association-rules
     - text: PrefixSpan
-      url: mllib-frequent-pattern-mining.html#prefix-span
+      url: mllib-frequent-pattern-mining.html#prefixspan
 - text: Evaluation metrics
   url: mllib-evaluation-metrics.html
 - text: PMML model export
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index b0be1d6227b3..fde354457f23 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -458,14 +458,14 @@ This example shows how to estimate clusters on streaming 
data.
 
 <div data-lang="python" markdown="1">
 Refer to the [`StreamingKMeans` Python 
docs](api/python/reference/api/pyspark.mllib.clustering.StreamingKMeans.html) 
for more details on the API.
-And Refer to [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing) for details on 
StreamingContext.
+And Refer to [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing-streamingcontext) for 
details on StreamingContext.
 
 {% include_example python/mllib/streaming_k_means_example.py %}
 </div>
 
 <div data-lang="scala" markdown="1">
 Refer to the [`StreamingKMeans` Scala 
docs](api/scala/org/apache/spark/mllib/clustering/StreamingKMeans.html) for 
details on the API.
-And Refer to [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing) for details on 
StreamingContext.
+And Refer to [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing-streamingcontext) for 
details on StreamingContext.
 
 {% include_example 
scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala %}
 </div>
diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md
index fdad7ae68dd4..8f4e6b1088b3 100644
--- a/docs/mllib-ensembles.md
+++ b/docs/mllib-ensembles.md
@@ -29,7 +29,7 @@ Both use [decision trees](mllib-decision-tree.html) as their 
base models.
 
 ## Gradient-Boosted Trees vs. Random Forests
 
-Both [Gradient-Boosted Trees 
(GBTs)](mllib-ensembles.html#Gradient-Boosted-Trees-(GBTS)) and [Random 
Forests](mllib-ensembles.html#Random-Forests) are algorithms for learning 
ensembles of trees, but the training processes are different.  There are 
several practical trade-offs:
+Both [Gradient-Boosted Trees 
(GBTs)](mllib-ensembles.html#gradient-boosted-trees-gbts) and [Random 
Forests](mllib-ensembles.html#random-forests) are algorithms for learning 
ensembles of trees, but the training processes are different.  There are 
several practical trade-offs:
 
  * GBTs train one tree at a time, so they can take longer to train than random 
forests.  Random Forests can train multiple trees in parallel.
    * On the other hand, it is often reasonable to use smaller (shallower) 
trees with GBTs than with Random Forests, and training smaller trees takes less 
time.
@@ -175,7 +175,7 @@ using both continuous and categorical features.
 `spark.mllib` implements GBTs using the existing [decision 
tree](mllib-decision-tree.html) implementation.  Please see the decision tree 
guide for more information on trees.
 
 *Note*: GBTs do not yet support multiclass classification.  For multiclass 
problems, please use
-[decision trees](mllib-decision-tree.html) or [Random 
Forests](mllib-ensembles.html#Random-Forest).
+[decision trees](mllib-decision-tree.html) or [Random 
Forests](mllib-ensembles.html#random-forests).
 
 ### Basic algorithm
 
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index dbb74407d030..080f80531a6d 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -44,7 +44,7 @@ which is now the primary API for MLlib.
   * [Gaussian mixture](mllib-clustering.html#gaussian-mixture)
   * [power iteration clustering 
(PIC)](mllib-clustering.html#power-iteration-clustering-pic)
   * [latent Dirichlet allocation 
(LDA)](mllib-clustering.html#latent-dirichlet-allocation-lda)
-  * [bisecting k-means](mllib-clustering.html#bisecting-kmeans)
+  * [bisecting k-means](mllib-clustering.html#bisecting-k-means)
   * [streaming k-means](mllib-clustering.html#streaming-k-means)
 * [Dimensionality reduction](mllib-dimensionality-reduction.html)
   * [singular value decomposition 
(SVD)](mllib-dimensionality-reduction.html#singular-value-decomposition-svd)
@@ -53,10 +53,9 @@ which is now the primary API for MLlib.
 * [Frequent pattern mining](mllib-frequent-pattern-mining.html)
   * [FP-growth](mllib-frequent-pattern-mining.html#fp-growth)
   * [association rules](mllib-frequent-pattern-mining.html#association-rules)
-  * [PrefixSpan](mllib-frequent-pattern-mining.html#prefix-span)
+  * [PrefixSpan](mllib-frequent-pattern-mining.html#prefixspan)
 * [Evaluation metrics](mllib-evaluation-metrics.html)
 * [PMML model export](mllib-pmml-model-export.html)
 * [Optimization (developer)](mllib-optimization.html)
   * [stochastic gradient 
descent](mllib-optimization.html#stochastic-gradient-descent-sgd)
   * [limited-memory BFGS 
(L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs)
-
diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md
index 448d881f794a..f8cd73eb2d0b 100644
--- a/docs/mllib-linear-methods.md
+++ b/docs/mllib-linear-methods.md
@@ -138,7 +138,7 @@ especially when the number of training examples is small.
 Under the hood, linear methods use convex optimization methods to optimize the 
objective functions.
 `spark.mllib` uses two methods, SGD and L-BFGS, described in the [optimization 
section](mllib-optimization.html).
 Currently, most algorithm APIs support Stochastic Gradient Descent (SGD), and 
a few support L-BFGS.
-Refer to [this optimization 
section](mllib-optimization.html#Choosing-an-Optimization-Method) for 
guidelines on choosing between optimization methods.
+Refer to [this optimization 
section](mllib-optimization.html#choosing-an-optimization-method) for 
guidelines on choosing between optimization methods.
 
 ## Classification
 
@@ -383,7 +383,7 @@ online to the first stream, and make predictions on the 
second stream.
 First, we import the necessary classes for parsing our input data and creating 
the model.
 
 Then we make input streams for training and testing data. We assume a 
StreamingContext `ssc`
-has already been created, see [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing)
+has already been created, see [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing-streamingcontext)
 for more info. For this example, we use labeled points in training and testing 
streams,
 but in practice you will likely want to use unlabeled vectors for test data.
 
@@ -408,7 +408,7 @@ Here a complete example:
 First, we import the necessary classes for parsing our input data and creating 
the model.
 
 Then we make input streams for training and testing data. We assume a 
StreamingContext `ssc`
-has already been created, see [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing)
+has already been created, see [Spark Streaming Programming 
Guide](streaming-programming-guide.html#initializing-streamingcontext)
 for more info. For this example, we use labeled points in training and testing 
streams,
 but in practice you will likely want to use unlabeled vectors for test data.
 
@@ -456,4 +456,3 @@ Algorithms are all implemented in Scala:
 * 
[LinearRegressionWithSGD](api/scala/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html)
 * 
[RidgeRegressionWithSGD](api/scala/org/apache/spark/mllib/regression/RidgeRegressionWithSGD.html)
 * [LassoWithSGD](api/scala/org/apache/spark/mllib/regression/LassoWithSGD.html)
-
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index a21d16419fd1..fbc83180d6b6 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -751,8 +751,7 @@ Learn more about getting started with ZooKeeper 
[here](https://zookeeper.apache.
 
 **Configuration**
 
-In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in 
spark-env by configuring `spark.deploy.recoveryMode` and related 
spark.deploy.zookeeper.* configurations.
-For more information about these configurations please refer to the 
[configuration doc](configuration.html#deploy)
+In order to enable this recovery mode, you can set `SPARK_DAEMON_JAVA_OPTS` in 
spark-env by configuring `spark.deploy.recoveryMode` and related 
`spark.deploy.zookeeper.*` configurations.
 
 Possible gotcha: If you have multiple Masters in your cluster but fail to 
correctly configure the Masters to use ZooKeeper, the Masters will fail to 
discover each other and think they're all leaders. This will not lead to a 
healthy cluster state (as all Masters will schedule independently).
 
diff --git a/docs/sparkr.md b/docs/sparkr.md
index a34a1200c4c0..ef99ea961c9b 100644
--- a/docs/sparkr.md
+++ b/docs/sparkr.md
@@ -571,7 +571,7 @@ SparkR supports the following machine learning algorithms 
currently:
 #### Frequent Pattern Mining
 
 * [`spark.fpGrowth`](api/R/reference/spark.fpGrowth.html) : 
[`FP-growth`](ml-frequent-pattern-mining.html#fp-growth)
-* [`spark.prefixSpan`](api/R/reference/spark.prefixSpan.html) : 
[`PrefixSpan`](ml-frequent-pattern-mining.html#prefixSpan)
+* [`spark.prefixSpan`](api/R/reference/spark.prefixSpan.html) : 
[`PrefixSpan`](ml-frequent-pattern-mining.html#prefixspan)
 
 #### Statistics
 
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index 712d4d3b8cd4..d71789956429 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -274,7 +274,7 @@ Data source options of Avro can be set via:
   <tr>
     <td><code>ignoreExtension</code></td>
     <td>true</td>
-    <td>The option controls ignoring of files without <code>.avro</code> 
extensions in read.<br> If the option is enabled, all files (with and without 
<code>.avro</code> extension) are loaded.<br> The option has been deprecated, 
and it will be removed in the future releases. Please use the general data 
source option <a 
href="./sql-data-sources-generic-options.html#path-global-filter">pathGlobFilter</a>
 for filtering file names.</td>
+    <td>The option controls ignoring of files without <code>.avro</code> 
extensions in read.<br> If the option is enabled, all files (with and without 
<code>.avro</code> extension) are loaded.<br> The option has been deprecated, 
and it will be removed in the future releases. Please use the general data 
source option <a 
href="./sql-data-sources-generic-options.html#path-glob-filter">pathGlobFilter</a>
 for filtering file names.</td>
     <td>read</td>
     <td>2.4.0</td>
   </tr>
diff --git a/docs/sql-data-sources.md b/docs/sql-data-sources.md
index a2bedd178abb..8ac819e1285e 100644
--- a/docs/sql-data-sources.md
+++ b/docs/sql-data-sources.md
@@ -36,7 +36,7 @@ goes into specific options that are available for the 
built-in data sources.
 * [Generic File Source Options](sql-data-sources-generic-options.html)
   * [Ignore Corrupt 
Files](sql-data-sources-generic-options.html#ignore-corrupt-files)
   * [Ignore Missing 
Files](sql-data-sources-generic-options.html#ignore-missing-files)
-  * [Path Global 
Filter](sql-data-sources-generic-options.html#path-global-filter)
+  * [Path Glob Filter](sql-data-sources-generic-options.html#path-glob-filter)
   * [Recursive File 
Lookup](sql-data-sources-generic-options.html#recursive-file-lookup)
 * [Parquet Files](sql-data-sources-parquet.html)
   * [Loading Data 
Programmatically](sql-data-sources-parquet.html#loading-data-programmatically)
diff --git a/docs/sql-ref-syntax-aux-show-partitions.md 
b/docs/sql-ref-syntax-aux-show-partitions.md
index d93825550413..0b2ed3507e29 100644
--- a/docs/sql-ref-syntax-aux-show-partitions.md
+++ b/docs/sql-ref-syntax-aux-show-partitions.md
@@ -105,6 +105,6 @@ SHOW PARTITIONS customer PARTITION (city =  'San Jose');
 ### Related Statements
 
 * [CREATE TABLE](sql-ref-syntax-ddl-create-table.html)
-* [INSERT STATEMENT](sql-ref-syntax-dml-insert.html)
+* [INSERT STATEMENT](sql-ref-syntax-dml-insert-table.html)
 * [DESCRIBE TABLE](sql-ref-syntax-aux-describe-table.html)
 * [SHOW TABLE](sql-ref-syntax-aux-show-table.html)
diff --git a/docs/streaming-programming-guide.md 
b/docs/streaming-programming-guide.md
index 96dd5528aac5..21e8fe6e8333 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -295,7 +295,7 @@ The complete code can be found in the Spark Streaming 
example
 
 </div>
 
-If you have already [downloaded](index.html#downloading) and 
[built](index.html#building) Spark,
+If you have already [downloaded](index.html#downloading) and 
[built](building-spark.html) Spark,
 you can run this example as follows. You will first need to run Netcat
 (a small utility found in most Unix-like systems) as a data server by using
 
diff --git a/docs/submitting-applications.md b/docs/submitting-applications.md
index 61517d5feacd..bf02ec137e20 100644
--- a/docs/submitting-applications.md
+++ b/docs/submitting-applications.md
@@ -179,8 +179,7 @@ The master URL passed to Spark can be in one of the 
following formats:
 
 The `spark-submit` script can load default [Spark configuration 
values](configuration.html) from a
 properties file and pass them on to your application. By default, it will read 
options
-from `conf/spark-defaults.conf` in the Spark directory. For more detail, see 
the section on
-[loading default 
configurations](configuration.html#loading-default-configurations).
+from `conf/spark-defaults.conf` in the `SPARK_HOME` directory.
 
 Loading default Spark configurations this way can obviate the need for certain 
flags to
 `spark-submit`. For instance, if the `spark.master` property is set, you can 
safely omit the
diff --git a/docs/tuning.md b/docs/tuning.md
index 94fe987175cf..f72dc0efd98e 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -196,7 +196,7 @@ the space allocated to the RDD cache to mitigate this.
 **Measuring the Impact of GC**
 
 The first step in GC tuning is to collect statistics on how frequently garbage 
collection occurs and the amount of
-time spent GC. This can be done by adding `-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps` to the Java options.  (See the [configuration 
guide](configuration.html#Dynamically-Loading-Spark-Properties) for info on 
passing Java options to Spark jobs.)  Next time your Spark job is run, you will 
see messages printed in the worker's logs
+time spent GC. This can be done by adding `-verbose:gc -XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps` to the Java options.  (See the [configuration 
guide](configuration.html#dynamically-loading-spark-properties) for info on 
passing Java options to Spark jobs.)  Next time your Spark job is run, you will 
see messages printed in the worker's logs
 each time a garbage collection occurs. Note these logs will be on your 
cluster's worker nodes (in the `stdout` files in
 their work directories), *not* on your driver program.
 
diff --git a/python/pyspark/mllib/clustering.py 
b/python/pyspark/mllib/clustering.py
index 4595268edc6c..71f42954decb 100644
--- a/python/pyspark/mllib/clustering.py
+++ b/python/pyspark/mllib/clustering.py
@@ -1130,7 +1130,7 @@ class LDAModel(JavaModelWrapper, JavaSaveable, 
Loader["LDAModel"]):
 
     .. [1] Blei, D. et al. "Latent Dirichlet Allocation."
         J. Mach. Learn. Res. 3 (2003): 993-1022.
-        https://www.jmlr.org/papers/v3/blei03a
+        
https://web.archive.org/web/20220128160306/https://www.jmlr.org/papers/v3/blei03a
 
     Examples
     --------


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [MINOR][DOCS] Fix various broken links and link anchors

Reply via email to