(spark) branch master updated: [MINOR][DOCS] Miscellaneous documentation improvements

gurwls223 Sun, 28 Jan 2024 17:06:27 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new f078998df2f3 [MINOR][DOCS] Miscellaneous documentation improvements
f078998df2f3 is described below

commit f078998df2f3ad61a33b72b2dae18de4951cd15f
Author: Nicholas Chammas <[email protected]>
AuthorDate: Mon Jan 29 10:06:07 2024 +0900

    [MINOR][DOCS] Miscellaneous documentation improvements
    
    ### What changes were proposed in this pull request?
    
    - Improve the formatting of various code snippets.
    - Fix some broken links in the documentation.
    - Clarify the non-intuitive behavior of `displayValue` in 
`getAllDefinedConfs()`.
    
    ### Why are the changes needed?
    
    These are minor quality of life improvements for users and developers alike.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it tweaks some of the links in user-facing documentation.
    
    ### How was this patch tested?
    
    Not tested beyond CI.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #44919 from nchammas/misc-doc-fixes.
    
    Authored-by: Nicholas Chammas <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 docs/configuration.md                                    | 16 ++++++++++------
 docs/mllib-dimensionality-reduction.md                   |  4 +++-
 docs/rdd-programming-guide.md                            |  6 ++++--
 docs/sql-data-sources-avro.md                            |  5 +++--
 .../scala/org/apache/spark/sql/internal/SQLConf.scala    |  7 ++++++-
 5 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index e771c323d369..7fef09781a15 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -88,10 +88,14 @@ val sc = new SparkContext(new SparkConf())
 {% endhighlight %}
 
 Then, you can supply configuration values at runtime:
-{% highlight bash %}
-./bin/spark-submit --name "My app" --master local[4] --conf 
spark.eventLog.enabled=false
-  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" myApp.jar
-{% endhighlight %}
+```sh
+./bin/spark-submit \
+  --name "My app" \
+  --master local[4] \
+  --conf spark.eventLog.enabled=false \
+  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" \
+  myApp.jar
+```
 
 The Spark shell and [`spark-submit`](submitting-applications.html)
 tool support two ways to load configurations dynamically. The first is command 
line options,
@@ -3708,9 +3712,9 @@ Also, you can modify or add configurations at runtime:
 GPUs and other accelerators have been widely used for accelerating special 
workloads, e.g.,
 deep learning and signal processing. Spark now supports requesting and 
scheduling generic resources, such as GPUs, with a few caveats. The current 
implementation requires that the resource have addresses that can be allocated 
by the scheduler. It requires your cluster manager to support and be properly 
configured with the resources.
 
-There are configurations available to request resources for the driver: 
<code>spark.driver.resource.{resourceName}.amount</code>, request resources for 
the executor(s): <code>spark.executor.resource.{resourceName}.amount</code> and 
specify the requirements for each task: 
<code>spark.task.resource.{resourceName}.amount</code>. The 
<code>spark.driver.resource.{resourceName}.discoveryScript</code> config is 
required on YARN, Kubernetes and a client side Driver on Spark Standalone. 
<code>spa [...]
+There are configurations available to request resources for the driver: 
`spark.driver.resource.{resourceName}.amount`, request resources for the 
executor(s): `spark.executor.resource.{resourceName}.amount` and specify the 
requirements for each task: `spark.task.resource.{resourceName}.amount`. The 
`spark.driver.resource.{resourceName}.discoveryScript` config is required on 
YARN, Kubernetes and a client side Driver on Spark Standalone. 
`spark.executor.resource.{resourceName}.discoveryScri [...]
 
-Spark will use the configurations specified to first request containers with 
the corresponding resources from the cluster manager. Once it gets the 
container, Spark launches an Executor in that container which will discover 
what resources the container has and the addresses associated with each 
resource. The Executor will register with the Driver and report back the 
resources available to that Executor. The Spark scheduler can then schedule 
tasks to each Executor and assign specific reso [...]
+Spark will use the configurations specified to first request containers with 
the corresponding resources from the cluster manager. Once it gets the 
container, Spark launches an Executor in that container which will discover 
what resources the container has and the addresses associated with each 
resource. The Executor will register with the Driver and report back the 
resources available to that Executor. The Spark scheduler can then schedule 
tasks to each Executor and assign specific reso [...]
 
 See your cluster manager specific page for requirements and details on each of 
- [YARN](running-on-yarn.html#resource-allocation-and-configuration-overview), 
[Kubernetes](running-on-kubernetes.html#resource-allocation-and-configuration-overview)
 and [Standalone 
Mode](spark-standalone.html#resource-allocation-and-configuration-overview). It 
is currently not available with local mode. And please also note that 
local-cluster mode with multiple workers is not supported(see Standalone 
documen [...]
 
diff --git a/docs/mllib-dimensionality-reduction.md 
b/docs/mllib-dimensionality-reduction.md
index a3262d21b10e..0cb7b29360c8 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -66,10 +66,12 @@ first and then compute its top eigenvalues and eigenvectors 
locally on the drive
 This requires a single pass with $O(n^2)$ storage on each executor and on the 
driver, and
 $O(n^2 k)$ time on the driver.
 * Otherwise, we compute $(A^T A) v$ in a distributive way and send it to
-<a href="http://www.caam.rice.edu/software/ARPACK/";>ARPACK</a> to
+[ARPACK][arpack] to
 compute $(A^T A)$'s top eigenvalues and eigenvectors on the driver node. This 
requires $O(k)$
 passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.
 
+[arpack]: 
https://web.archive.org/web/20210503024933/http://www.caam.rice.edu/software/ARPACK
+
 ### SVD Example
  
 `spark.mllib` provides SVD functionality to row-oriented matrices, provided in 
the
diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md
index 2e0f9d3bd6ef..7f69272cbeb0 100644
--- a/docs/rdd-programming-guide.md
+++ b/docs/rdd-programming-guide.md
@@ -877,11 +877,13 @@ The most common ones are distributed "shuffle" 
operations, such as grouping or a
 by a key.
 
 In Scala, these operations are automatically available on RDDs containing
-[Tuple2](http://www.scala-lang.org/api/{{site.SCALA_VERSION}}/index.html#scala.Tuple2)
 objects
+[Tuple2][tuple2] objects
 (the built-in tuples in the language, created by simply writing `(a, b)`). The 
key-value pair operations are available in the
 [PairRDDFunctions](api/scala/org/apache/spark/rdd/PairRDDFunctions.html) class,
 which automatically wraps around an RDD of tuples.
 
+[tuple2]: 
https://www.scala-lang.org/api/{{site.SCALA_VERSION}}/scala/Tuple2.html
+
 For example, the following code uses the `reduceByKey` operation on key-value 
pairs to count how
 many times each line of text occurs in a file:
 
@@ -909,7 +911,7 @@ The most common ones are distributed "shuffle" operations, 
such as grouping or a
 by a key.
 
 In Java, key-value pairs are represented using the
-[scala.Tuple2](http://www.scala-lang.org/api/{{site.SCALA_VERSION}}/index.html#scala.Tuple2)
 class
+[scala.Tuple2][tuple2] class
 from the Scala standard library. You can simply call `new Tuple2(a, b)` to 
create a tuple, and access
 its fields later with `tuple._1()` and `tuple._2()`.
 
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index 2172cb68fb98..ddfdc89370b1 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -438,10 +438,11 @@ built-in but external module, both implicit classes are 
removed. Please use `.fo
 
 If you prefer using your own build of `spark-avro` jar file, you can simply 
disable the configuration
 `spark.sql.legacy.replaceDatabricksSparkAvro.enabled`, and use the option 
`--jars` on deploying your
-applications. Read the [Advanced Dependency Management](https://spark.apache
-.org/docs/latest/submitting-applications.html#advanced-dependency-management) 
section in Application
+applications. Read the [Advanced Dependency Management][adm] section in the 
Application
 Submission Guide for more details.
 
+[adm]: submitting-applications.html#advanced-dependency-management
+
 ## Supported types for Avro -> Spark SQL conversion
 Currently Spark supports reading all [primitive 
types](https://avro.apache.org/docs/1.11.3/specification/#primitive-types) and 
[complex 
types](https://avro.apache.org/docs/1.11.3/specification/#complex-types) under 
records of Avro.
 <table>
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index bc4734775c77..054858e1c598 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -5684,7 +5684,12 @@ class SQLConf extends Serializable with Logging with 
SqlApiConf {
   def getAllDefinedConfs: Seq[(String, String, String, String)] = {
     loadDefinedConfs()
     getConfigEntries().asScala.filter(_.isPublic).map { entry =>
-      val displayValue = Option(getConfString(entry.key, 
null)).getOrElse(entry.defaultValueString)
+      val displayValue =
+        // We get the display value in this way rather than call 
getConfString(entry.key)
+        // because we want the default _definition_ and not the computed value.
+        //   e.g. `<undefined>` instead of `null`
+        //   e.g. `<value of spark.buffer.size>` instead of `65536`
+        Option(getConfString(entry.key, 
null)).getOrElse(entry.defaultValueString)
       (entry.key, displayValue, entry.doc, entry.version)
     }.toSeq
   }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [MINOR][DOCS] Miscellaneous documentation improvements

Reply via email to