This is an automated email from the ASF dual-hosted git repository.
srowen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.1 by this push:
new e6f4859 [SPARK-36209][PYTHON][DOCS] Fix link to pyspark Dataframe
documentation
e6f4859 is described below
commit e6f48596f32b84ec99fdb8ca5ac85126f470b394
Author: Dominik Gehl <[email protected]>
AuthorDate: Thu Jul 22 08:07:00 2021 -0500
[SPARK-36209][PYTHON][DOCS] Fix link to pyspark Dataframe documentation
### What changes were proposed in this pull request?
Bugfix: link to correction location of Pyspark Dataframe documentation
### Why are the changes needed?
Current website returns "Not found"
### Does this PR introduce _any_ user-facing change?
Website fix
### How was this patch tested?
Documentation change
Closes #33420 from dominikgehl/feature/SPARK-36209.
Authored-by: Dominik Gehl <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 3a1db2ddd439a6df2a1dd896aab8420a9b45286b)
Signed-off-by: Sean Owen <[email protected]>
---
docs/ml-migration-guide.md | 2 +-
docs/ml-pipeline.md | 2 +-
docs/rdd-programming-guide.md | 12 ++++++------
docs/sql-migration-guide.md | 4 ++--
docs/sql-programming-guide.md | 2 +-
docs/streaming-kinesis-integration.md | 2 +-
docs/streaming-programming-guide.md | 11 +++++------
docs/structured-streaming-programming-guide.md | 14 +++++++-------
8 files changed, 24 insertions(+), 25 deletions(-)
diff --git a/docs/ml-migration-guide.md b/docs/ml-migration-guide.md
index 43b8de8..8350ca5 100644
--- a/docs/ml-migration-guide.md
+++ b/docs/ml-migration-guide.md
@@ -269,7 +269,7 @@ mlVec = mllibVec.asML()
mlMat = mllibMat.asML()
{% endhighlight %}
-Refer to the [`MLUtils` Python
docs](api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils) for further
detail.
+Refer to the [`MLUtils` Python
docs](api/python/reference/api/pyspark.mllib.util.MLUtils.html#pyspark.mllib.util.MLUtils)
for further detail.
</div>
</div>
diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
index 8a9599e..105b127 100644
--- a/docs/ml-pipeline.md
+++ b/docs/ml-pipeline.md
@@ -240,7 +240,7 @@ This section gives code examples illustrating the
functionality discussed above.
For more info, please refer to the API documentation
([Scala](api/scala/org/apache/spark/ml/package.html),
[Java](api/java/org/apache/spark/ml/package-summary.html),
-and [Python](api/python/pyspark.ml.html)).
+and [Python](api/python/reference/pyspark.ml.html)).
## Example: Estimator, Transformer, and Param
diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md
index acc682b..65cfb57 100644
--- a/docs/rdd-programming-guide.md
+++ b/docs/rdd-programming-guide.md
@@ -177,8 +177,8 @@ JavaSparkContext sc = new JavaSparkContext(conf);
<div data-lang="python" markdown="1">
-The first thing a Spark program must do is to create a
[SparkContext](api/python/pyspark.html#pyspark.SparkContext) object, which
tells Spark
-how to access a cluster. To create a `SparkContext` you first need to build a
[SparkConf](api/python/pyspark.html#pyspark.SparkConf) object
+The first thing a Spark program must do is to create a
[SparkContext](api/python/reference/api/pyspark.SparkContext.html#pyspark.SparkContext)
object, which tells Spark
+how to access a cluster. To create a `SparkContext` you first need to build a
[SparkConf](api/python/reference/api/pyspark.SparkConf.html#pyspark.SparkConf)
object
that contains information about your application.
{% highlight python %}
@@ -948,7 +948,7 @@ The following table lists some of the common
transformations supported by Spark.
RDD API doc
([Scala](api/scala/org/apache/spark/rdd/RDD.html),
[Java](api/java/index.html?org/apache/spark/api/java/JavaRDD.html),
- [Python](api/python/pyspark.html#pyspark.RDD),
+ [Python](api/python/reference/api/pyspark.RDD.html#pyspark.RDD),
[R](api/R/index.html))
and pair RDD functions doc
([Scala](api/scala/org/apache/spark/rdd/PairRDDFunctions.html),
@@ -1062,7 +1062,7 @@ The following table lists some of the common actions
supported by Spark. Refer t
RDD API doc
([Scala](api/scala/org/apache/spark/rdd/RDD.html),
[Java](api/java/index.html?org/apache/spark/api/java/JavaRDD.html),
- [Python](api/python/pyspark.html#pyspark.RDD),
+ [Python](api/python/reference/api/pyspark.RDD.html#pyspark.RDD),
[R](api/R/index.html))
and pair RDD functions doc
@@ -1210,7 +1210,7 @@ replicate it across nodes.
These levels are set by passing a
`StorageLevel` object
([Scala](api/scala/org/apache/spark/storage/StorageLevel.html),
[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
-[Python](api/python/pyspark.html#pyspark.StorageLevel))
+[Python](api/python/reference/api/pyspark.StorageLevel.html#pyspark.StorageLevel))
to `persist()`. The `cache()` method is a shorthand for using the default
storage level,
which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory).
The full set of
storage levels is:
@@ -1515,7 +1515,7 @@ Accumulator<id=0, value=0>
{% endhighlight %}
While this code used the built-in support for accumulators of type Int,
programmers can also
-create their own types by subclassing
[AccumulatorParam](api/python/pyspark.html#pyspark.AccumulatorParam).
+create their own types by subclassing
[AccumulatorParam](api/python/reference/api/pyspark.AccumulatorParam.html#pyspark.AccumulatorParam).
The AccumulatorParam interface has two methods: `zero` for providing a "zero
value" for your data
type, and `addInPlace` for adding two values together. For example, supposing
we had a `Vector` class
representing mathematical vectors, we could write:
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index c7f233d..7a905bf 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -687,11 +687,11 @@ and deprecated the old APIs (e.g.,
`SQLContext.parquetFile`, `SQLContext.jsonFil
See the API docs for `SQLContext.read` (
<a
href="api/scala/org/apache/spark/sql/SQLContext.html#read:DataFrameReader">Scala</a>,
<a href="api/java/org/apache/spark/sql/SQLContext.html#read()">Java</a>,
- <a href="api/python/pyspark.sql.html#pyspark.sql.SQLContext.read">Python</a>
+ <a
href="api/python/reference/api/pyspark.sql.SparkSession.read.html#pyspark.sql.SparkSession.read">Python</a>
) and `DataFrame.write` (
<a
href="api/scala/org/apache/spark/sql/DataFrame.html#write:DataFrameWriter">Scala</a>,
<a href="api/java/org/apache/spark/sql/Dataset.html#write()">Java</a>,
- <a href="api/python/pyspark.sql.html#pyspark.sql.DataFrame.write">Python</a>
+ <a
href="api/python/reference/api/pyspark.sql.DataFrame.write.html#pyspark.sql.DataFrame.write">Python</a>
) more information.
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 06bf553..c957c59 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -56,7 +56,7 @@ equivalent to a table in a relational database or a data
frame in R/Python, but
optimizations under the hood. DataFrames can be constructed from a wide array
of [sources](sql-data-sources.html) such
as: structured data files, tables in Hive, external databases, or existing
RDDs.
The DataFrame API is available in Scala,
-Java, [Python](api/python/pyspark.sql.html#pyspark.sql.DataFrame), and
[R](api/R/index.html).
+Java,
[Python](api/python/reference/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame),
and [R](api/R/index.html).
In Scala and Java, a DataFrame is represented by a Dataset of `Row`s.
In [the Scala API][scala-datasets], `DataFrame` is simply a type alias of
`Dataset[Row]`.
While, in [Java API][java-datasets], users need to use `Dataset<Row>` to
represent a `DataFrame`.
diff --git a/docs/streaming-kinesis-integration.md
b/docs/streaming-kinesis-integration.md
index c7959d4..905326b 100644
--- a/docs/streaming-kinesis-integration.md
+++ b/docs/streaming-kinesis-integration.md
@@ -92,7 +92,7 @@ A Kinesis stream can be set up at one of the valid Kinesis
endpoints with 1 or m
streamingContext, [Kinesis app name], [Kinesis stream name],
[endpoint URL],
[region name], [initial position], [checkpoint interval],
StorageLevel.MEMORY_AND_DISK_2)
- See the [API
docs](api/python/pyspark.streaming.html#pyspark.streaming.kinesis.KinesisUtils)
+ See the [API docs](api/python/reference/pyspark.streaming.html#kinesis)
and the
[example]({{site.SPARK_GITHUB_URL}}/tree/master/external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py).
Refer to the [Running the Example](#running-the-example) subsection for
instructions to run the example.
</div>
diff --git a/docs/streaming-programming-guide.md
b/docs/streaming-programming-guide.md
index 56a455a..b13bf8d 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -221,7 +221,7 @@ The complete code can be found in the Spark Streaming
example
</div>
<div data-lang="python" markdown="1" >
-First, we import
[StreamingContext](api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext),
which is the main entry point for all streaming functionality. We create a
local StreamingContext with two execution threads, and batch interval of 1
second.
+First, we import
[StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext),
which is the main entry point for all streaming functionality. We create a
local StreamingContext with two execution threads, and batch interval of 1
second.
{% highlight python %}
from pyspark import SparkContext
@@ -503,7 +503,7 @@ JavaStreamingContext ssc = new JavaStreamingContext(sc,
Durations.seconds(1));
</div>
<div data-lang="python" markdown="1">
-A
[StreamingContext](api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext)
object can be created from a
[SparkContext](api/python/pyspark.html#pyspark.SparkContext) object.
+A
[StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext)
object can be created from a
[SparkContext](api/python/reference/api/pyspark.SparkContext.html#pyspark.SparkContext)
object.
{% highlight python %}
from pyspark import SparkContext
@@ -741,7 +741,7 @@ For testing a Spark Streaming application with test data,
one can also create a
For more details on streams from sockets and files, see the API documentations
of the relevant functions in
[StreamingContext](api/scala/org/apache/spark/streaming/StreamingContext.html)
for
Scala,
[JavaStreamingContext](api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html)
-for Java, and
[StreamingContext](api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext)
for Python.
+for Java, and
[StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext)
for Python.
### Advanced Sources
{:.no_toc}
@@ -1223,7 +1223,7 @@ see
[DStream](api/scala/org/apache/spark/streaming/dstream/DStream.html)
and
[PairDStreamFunctions](api/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.html).
For the Java API, see
[JavaDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaDStream.html)
and
[JavaPairDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairDStream.html).
-For the Python API, see
[DStream](api/python/pyspark.streaming.html#pyspark.streaming.DStream).
+For the Python API, see
[DStream](api/python/reference/api/pyspark.streaming.DStream.html#pyspark.streaming.DStream).
***
@@ -2496,8 +2496,7 @@ additional effort may be necessary to achieve
exactly-once semantics. There are
*
[KafkaUtils](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html),
[KinesisUtils](api/java/index.html?org/apache/spark/streaming/kinesis/KinesisInputDStream.html)
- Python docs
- *
[StreamingContext](api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext)
and [DStream](api/python/pyspark.streaming.html#pyspark.streaming.DStream)
- *
[KafkaUtils](api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils)
+ *
[StreamingContext](api/python/reference/api/pyspark.streaming.StreamingContext.html#pyspark.streaming.StreamingContext)
and
[DStream](api/python/reference/api/pyspark.streaming.DStream.html#pyspark.streaming.DStream)
* More examples in
[Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming)
and
[Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples/streaming)
diff --git a/docs/structured-streaming-programming-guide.md
b/docs/structured-streaming-programming-guide.md
index fbb2d57..3b93ab8 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -498,13 +498,13 @@ to track the read position in the stream. The engine uses
checkpointing and writ
# API using Datasets and DataFrames
Since Spark 2.0, DataFrames and Datasets can represent static, bounded data,
as well as streaming, unbounded data. Similar to static Datasets/DataFrames,
you can use the common entry point `SparkSession`
-([Scala](api/scala/org/apache/spark/sql/SparkSession.html)/[Java](api/java/org/apache/spark/sql/SparkSession.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.SparkSession)/[R](api/R/sparkR.session.html)
docs)
+([Scala](api/scala/org/apache/spark/sql/SparkSession.html)/[Java](api/java/org/apache/spark/sql/SparkSession.html)/[Python](api/python/reference/api/pyspark.sql.SparkSession.html#pyspark.sql.SparkSession)/[R](api/R/sparkR.session.html)
docs)
to create streaming DataFrames/Datasets from streaming sources, and apply the
same operations on them as static DataFrames/Datasets. If you are not familiar
with Datasets/DataFrames, you are strongly advised to familiarize yourself with
them using the
[DataFrame/Dataset Programming Guide](sql-programming-guide.html).
## Creating streaming DataFrames and streaming Datasets
Streaming DataFrames can be created through the `DataStreamReader` interface
-([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamReader.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamReader.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader)
docs)
+([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamReader.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamReader.html)/[Python](api/python/reference/api/pyspark.sql.streaming.DataStreamReader.html#pyspark.sql.streaming.DataStreamReader)
docs)
returned by `SparkSession.readStream()`. In [R](api/R/read.stream.html), with
the `read.stream()` method. Similar to the read interface for creating static
DataFrame, you can specify the details of the source – data format, schema,
options, etc.
#### Input Sources
@@ -558,7 +558,7 @@ Here are the details of all the sources in Spark.
NOTE 3: Both delete and move actions are best effort. Failing to
delete or move files will not fail the streaming query. Spark may not clean up
some source files in some circumstances - e.g. the application doesn't shut
down gracefully, too many files are queued to clean up.
<br/><br/>
For file-format-specific options, see the related methods in
<code>DataStreamReader</code>
- (<a
href="api/scala/org/apache/spark/sql/streaming/DataStreamReader.html">Scala</a>/<a
href="api/java/org/apache/spark/sql/streaming/DataStreamReader.html">Java</a>/<a
href="api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader">Python</a>/<a
+ (<a
href="api/scala/org/apache/spark/sql/streaming/DataStreamReader.html">Scala</a>/<a
href="api/java/org/apache/spark/sql/streaming/DataStreamReader.html">Java</a>/<a
href="api/python/reference/api/pyspark.sql.streaming.DataStreamReader.html#pyspark.sql.streaming.DataStreamReader">Python</a>/<a
href="api/R/read.stream.html">R</a>).
E.g. for "parquet" format options see
<code>DataStreamReader.parquet()</code>.
<br/><br/>
@@ -1723,7 +1723,7 @@ end-to-end exactly once per query. Ensuring end-to-end
exactly once for the last
## Starting Streaming Queries
Once you have defined the final result DataFrame/Dataset, all that is left is
for you to start the streaming computation. To do that, you have to use the
`DataStreamWriter`
-([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamWriter)
docs)
+([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Python](api/python/reference/api/pyspark.sql.streaming.DataStreamWriter.html#pyspark.sql.streaming.DataStreamWriter)
docs)
returned through `Dataset.writeStream()`. You will have to specify one or more
of the following in this interface.
- *Details of the output sink:* Data format, location, etc.
@@ -1913,7 +1913,7 @@ Here are the details of all the sinks in Spark.
By default it's disabled.
<br/><br/>
For file-format-specific options, see the related methods in
DataFrameWriter
- (<a
href="api/scala/org/apache/spark/sql/DataFrameWriter.html">Scala</a>/<a
href="api/java/org/apache/spark/sql/DataFrameWriter.html">Java</a>/<a
href="api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter">Python</a>/<a
+ (<a
href="api/scala/org/apache/spark/sql/DataFrameWriter.html">Scala</a>/<a
href="api/java/org/apache/spark/sql/DataFrameWriter.html">Java</a>/<a
href="api/python/reference/api/pyspark.sql.streaming.DataStreamWriter.html#pyspark.sql.streaming.DataStreamWriter">Python</a>/<a
href="api/R/write.stream.html">R</a>).
E.g. for "parquet" format options see
<code>DataFrameWriter.parquet()</code>
</td>
@@ -2456,7 +2456,7 @@ Not available in R.
</div>
</div>
-For more details, please check the docs for DataStreamReader
([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamReader.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamReader.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader)
docs) and DataStreamWriter
([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Python](api/python/pyspark.sql.html#pysp
[...]
+For more details, please check the docs for DataStreamReader
([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamReader.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamReader.html)/[Python](api/python/reference/api/pyspark.sql.streaming.DataStreamReader.html#pyspark.sql.streaming.DataStreamReader)
docs) and DataStreamWriter
([Scala](api/scala/org/apache/spark/sql/streaming/DataStreamWriter.html)/[Java](api/java/org/apache/spark/sql/streaming/DataStreamWriter.html)/
[...]
#### Triggers
The trigger settings of a streaming query define the timing of streaming data
processing, whether
@@ -2727,7 +2727,7 @@ lastProgress(query) # the most recent progress
update of this streaming qu
</div>
You can start any number of queries in a single SparkSession. They will all be
running concurrently sharing the cluster resources. You can use
`sparkSession.streams()` to get the `StreamingQueryManager`
-([Scala](api/scala/org/apache/spark/sql/streaming/StreamingQueryManager.html)/[Java](api/java/org/apache/spark/sql/streaming/StreamingQueryManager.html)/[Python](api/python/pyspark.sql.html#pyspark.sql.streaming.StreamingQueryManager)
docs)
+([Scala](api/scala/org/apache/spark/sql/streaming/StreamingQueryManager.html)/[Java](api/java/org/apache/spark/sql/streaming/StreamingQueryManager.html)/[Python](api/python/reference/api/pyspark.sql.streaming.StreamingQueryManager.html#pyspark.sql.streaming.StreamingQueryManager)
docs)
that can be used to manage the currently active queries.
<div class="codetabs">
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]