This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new f078998df2f3 [MINOR][DOCS] Miscellaneous documentation improvements
f078998df2f3 is described below
commit f078998df2f3ad61a33b72b2dae18de4951cd15f
Author: Nicholas Chammas <[email protected]>
AuthorDate: Mon Jan 29 10:06:07 2024 +0900
[MINOR][DOCS] Miscellaneous documentation improvements
### What changes were proposed in this pull request?
- Improve the formatting of various code snippets.
- Fix some broken links in the documentation.
- Clarify the non-intuitive behavior of `displayValue` in
`getAllDefinedConfs()`.
### Why are the changes needed?
These are minor quality of life improvements for users and developers alike.
### Does this PR introduce _any_ user-facing change?
Yes, it tweaks some of the links in user-facing documentation.
### How was this patch tested?
Not tested beyond CI.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44919 from nchammas/misc-doc-fixes.
Authored-by: Nicholas Chammas <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
docs/configuration.md | 16 ++++++++++------
docs/mllib-dimensionality-reduction.md | 4 +++-
docs/rdd-programming-guide.md | 6 ++++--
docs/sql-data-sources-avro.md | 5 +++--
.../scala/org/apache/spark/sql/internal/SQLConf.scala | 7 ++++++-
5 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/docs/configuration.md b/docs/configuration.md
index e771c323d369..7fef09781a15 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -88,10 +88,14 @@ val sc = new SparkContext(new SparkConf())
{% endhighlight %}
Then, you can supply configuration values at runtime:
-{% highlight bash %}
-./bin/spark-submit --name "My app" --master local[4] --conf
spark.eventLog.enabled=false
- --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps" myApp.jar
-{% endhighlight %}
+```sh
+./bin/spark-submit \
+ --name "My app" \
+ --master local[4] \
+ --conf spark.eventLog.enabled=false \
+ --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps" \
+ myApp.jar
+```
The Spark shell and [`spark-submit`](submitting-applications.html)
tool support two ways to load configurations dynamically. The first is command
line options,
@@ -3708,9 +3712,9 @@ Also, you can modify or add configurations at runtime:
GPUs and other accelerators have been widely used for accelerating special
workloads, e.g.,
deep learning and signal processing. Spark now supports requesting and
scheduling generic resources, such as GPUs, with a few caveats. The current
implementation requires that the resource have addresses that can be allocated
by the scheduler. It requires your cluster manager to support and be properly
configured with the resources.
-There are configurations available to request resources for the driver:
<code>spark.driver.resource.{resourceName}.amount</code>, request resources for
the executor(s): <code>spark.executor.resource.{resourceName}.amount</code> and
specify the requirements for each task:
<code>spark.task.resource.{resourceName}.amount</code>. The
<code>spark.driver.resource.{resourceName}.discoveryScript</code> config is
required on YARN, Kubernetes and a client side Driver on Spark Standalone.
<code>spa [...]
+There are configurations available to request resources for the driver:
`spark.driver.resource.{resourceName}.amount`, request resources for the
executor(s): `spark.executor.resource.{resourceName}.amount` and specify the
requirements for each task: `spark.task.resource.{resourceName}.amount`. The
`spark.driver.resource.{resourceName}.discoveryScript` config is required on
YARN, Kubernetes and a client side Driver on Spark Standalone.
`spark.executor.resource.{resourceName}.discoveryScri [...]
-Spark will use the configurations specified to first request containers with
the corresponding resources from the cluster manager. Once it gets the
container, Spark launches an Executor in that container which will discover
what resources the container has and the addresses associated with each
resource. The Executor will register with the Driver and report back the
resources available to that Executor. The Spark scheduler can then schedule
tasks to each Executor and assign specific reso [...]
+Spark will use the configurations specified to first request containers with
the corresponding resources from the cluster manager. Once it gets the
container, Spark launches an Executor in that container which will discover
what resources the container has and the addresses associated with each
resource. The Executor will register with the Driver and report back the
resources available to that Executor. The Spark scheduler can then schedule
tasks to each Executor and assign specific reso [...]
See your cluster manager specific page for requirements and details on each of
- [YARN](running-on-yarn.html#resource-allocation-and-configuration-overview),
[Kubernetes](running-on-kubernetes.html#resource-allocation-and-configuration-overview)
and [Standalone
Mode](spark-standalone.html#resource-allocation-and-configuration-overview). It
is currently not available with local mode. And please also note that
local-cluster mode with multiple workers is not supported(see Standalone
documen [...]
diff --git a/docs/mllib-dimensionality-reduction.md
b/docs/mllib-dimensionality-reduction.md
index a3262d21b10e..0cb7b29360c8 100644
--- a/docs/mllib-dimensionality-reduction.md
+++ b/docs/mllib-dimensionality-reduction.md
@@ -66,10 +66,12 @@ first and then compute its top eigenvalues and eigenvectors
locally on the drive
This requires a single pass with $O(n^2)$ storage on each executor and on the
driver, and
$O(n^2 k)$ time on the driver.
* Otherwise, we compute $(A^T A) v$ in a distributive way and send it to
-<a href="http://www.caam.rice.edu/software/ARPACK/">ARPACK</a> to
+[ARPACK][arpack] to
compute $(A^T A)$'s top eigenvalues and eigenvectors on the driver node. This
requires $O(k)$
passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.
+[arpack]:
https://web.archive.org/web/20210503024933/http://www.caam.rice.edu/software/ARPACK
+
### SVD Example
`spark.mllib` provides SVD functionality to row-oriented matrices, provided in
the
diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md
index 2e0f9d3bd6ef..7f69272cbeb0 100644
--- a/docs/rdd-programming-guide.md
+++ b/docs/rdd-programming-guide.md
@@ -877,11 +877,13 @@ The most common ones are distributed "shuffle"
operations, such as grouping or a
by a key.
In Scala, these operations are automatically available on RDDs containing
-[Tuple2](http://www.scala-lang.org/api/{{site.SCALA_VERSION}}/index.html#scala.Tuple2)
objects
+[Tuple2][tuple2] objects
(the built-in tuples in the language, created by simply writing `(a, b)`). The
key-value pair operations are available in the
[PairRDDFunctions](api/scala/org/apache/spark/rdd/PairRDDFunctions.html) class,
which automatically wraps around an RDD of tuples.
+[tuple2]:
https://www.scala-lang.org/api/{{site.SCALA_VERSION}}/scala/Tuple2.html
+
For example, the following code uses the `reduceByKey` operation on key-value
pairs to count how
many times each line of text occurs in a file:
@@ -909,7 +911,7 @@ The most common ones are distributed "shuffle" operations,
such as grouping or a
by a key.
In Java, key-value pairs are represented using the
-[scala.Tuple2](http://www.scala-lang.org/api/{{site.SCALA_VERSION}}/index.html#scala.Tuple2)
class
+[scala.Tuple2][tuple2] class
from the Scala standard library. You can simply call `new Tuple2(a, b)` to
create a tuple, and access
its fields later with `tuple._1()` and `tuple._2()`.
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index 2172cb68fb98..ddfdc89370b1 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -438,10 +438,11 @@ built-in but external module, both implicit classes are
removed. Please use `.fo
If you prefer using your own build of `spark-avro` jar file, you can simply
disable the configuration
`spark.sql.legacy.replaceDatabricksSparkAvro.enabled`, and use the option
`--jars` on deploying your
-applications. Read the [Advanced Dependency Management](https://spark.apache
-.org/docs/latest/submitting-applications.html#advanced-dependency-management)
section in Application
+applications. Read the [Advanced Dependency Management][adm] section in the
Application
Submission Guide for more details.
+[adm]: submitting-applications.html#advanced-dependency-management
+
## Supported types for Avro -> Spark SQL conversion
Currently Spark supports reading all [primitive
types](https://avro.apache.org/docs/1.11.3/specification/#primitive-types) and
[complex
types](https://avro.apache.org/docs/1.11.3/specification/#complex-types) under
records of Avro.
<table>
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index bc4734775c77..054858e1c598 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -5684,7 +5684,12 @@ class SQLConf extends Serializable with Logging with
SqlApiConf {
def getAllDefinedConfs: Seq[(String, String, String, String)] = {
loadDefinedConfs()
getConfigEntries().asScala.filter(_.isPublic).map { entry =>
- val displayValue = Option(getConfString(entry.key,
null)).getOrElse(entry.defaultValueString)
+ val displayValue =
+ // We get the display value in this way rather than call
getConfString(entry.key)
+ // because we want the default _definition_ and not the computed value.
+ // e.g. `<undefined>` instead of `null`
+ // e.g. `<value of spark.buffer.size>` instead of `65536`
+ Option(getConfString(entry.key,
null)).getOrElse(entry.defaultValueString)
(entry.key, displayValue, entry.doc, entry.version)
}.toSeq
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]