[spark] branch master updated: [MINOR][DOCS] Fix some broken links in docs

gurwls223 Sat, 13 Apr 2019 06:28:26 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 38fc8e2  [MINOR][DOCS] Fix some broken links in docs
38fc8e2 is described below

commit 38fc8e2484aa4971d1f2c115da61fc96f36e7868
Author: Sean Owen <[email protected]>
AuthorDate: Sat Apr 13 22:27:25 2019 +0900

    [MINOR][DOCS] Fix some broken links in docs
    
    ## What changes were proposed in this pull request?
    
    Fix some broken links in docs
    
    ## How was this patch tested?
    
    N/A
    
    Closes #24361 from srowen/BrokenLinks.
    
    Authored-by: Sean Owen <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
---
 docs/hardware-provisioning.md |  2 +-
 docs/ml-advanced.md           |  2 +-
 docs/mllib-clustering.md      |  2 +-
 docs/rdd-programming-guide.md | 10 +++++-----
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index dab65e2..4e5d681 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -45,7 +45,7 @@ nodes than the storage system to avoid interference.
 While Spark can perform a lot of its computation in memory, it still uses 
local disks to store
 data that doesn't fit in RAM, as well as to preserve intermediate output 
between stages. We
 recommend having **4-8 disks** per node, configured _without_ RAID (just as 
separate mount points).
-In Linux, mount the disks with the [`noatime` 
option](http://www.centos.org/docs/5/html/Global_File_System/s2-manage-mountnoatime.html)
+In Linux, mount the disks with the `noatime` option
 to reduce unnecessary writes. In Spark, [configure](configuration.html) the 
`spark.local.dir`
 variable to be a comma-separated list of the local disks. If you are running 
HDFS, it's fine to
 use the same disks as HDFS.
diff --git a/docs/ml-advanced.md b/docs/ml-advanced.md
index 8e8c701..5787fe9 100644
--- a/docs/ml-advanced.md
+++ b/docs/ml-advanced.md
@@ -52,7 +52,7 @@ explicitly in Newton's method. As a result, L-BFGS often 
achieves faster converg
 other first-order optimizations.
 
 [Orthant-Wise Limited-memory
-Quasi-Newton](http://research-srv.microsoft.com/en-us/um/people/jfgao/paper/icml07scalable.pdf)
+Quasi-Newton](https://www.microsoft.com/en-us/research/wp-content/uploads/2007/01/andrew07scalable.pdf)
 (OWL-QN) is an extension of L-BFGS that can effectively handle L1 and elastic 
net regularization.
 
 L-BFGS is used as a solver for 
[LinearRegression](api/scala/index.html#org.apache.spark.ml.regression.LinearRegression),
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 18bab11..12c33a5 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -237,7 +237,7 @@ LDA supports different inference algorithms via 
`setOptimizer` function.
 on the likelihood function and yields comprehensive results, while
 `OnlineLDAOptimizer` uses iterative mini-batch sampling for [online
 variational
-inference](https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf)
+inference](https://mimno.infosci.cornell.edu/info6150/readings/HoffmanBleiBach2010b.pdf)
 and is generally memory friendly.
 
 LDA takes in a collection of documents as vectors of word counts and the
diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md
index b568e94..c937740 100644
--- a/docs/rdd-programming-guide.md
+++ b/docs/rdd-programming-guide.md
@@ -345,7 +345,7 @@ One important parameter for parallel collections is the 
number of *partitions* t
 
 <div data-lang="scala"  markdown="1">
 
-Spark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
+Spark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
 
 Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes a URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
 
@@ -368,7 +368,7 @@ Apart from text files, Spark's Scala API also supports 
several other data format
 
 * `SparkContext.wholeTextFiles` lets you read a directory containing multiple 
small text files, and returns each of them as (filename, content) pairs. This 
is in contrast with `textFile`, which would return one record per line in each 
file. Partitioning is determined by data locality which, in some cases, may 
result in too few partitions. For those cases, `wholeTextFiles` provides an 
optional second argument for controlling the minimal number of partitions.
 
-* For 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 use SparkContext's `sequenceFile[K, V]` method where `K` and `V` are the types 
of key and values in the file. These should be subclasses of Hadoop's 
[Writable](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html)
 interface, like 
[IntWritable](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/IntWritable.html)
 an [...]
+* For 
[SequenceFiles](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 use SparkContext's `sequenceFile[K, V]` method where `K` and `V` are the types 
of key and values in the file. These should be subclasses of Hadoop's 
[Writable](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/Writable.html)
 interface, like 
[IntWritable](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/IntWritable.html)
 and [Text](https://hado [...]
 
 * For other Hadoop InputFormats, you can use the `SparkContext.hadoopRDD` 
method, which takes an arbitrary `JobConf` and input format class, key class 
and value class. Set these the same way you would for a Hadoop job with your 
input source. You can also use `SparkContext.newAPIHadoopRDD` for InputFormats 
based on the "new" MapReduce API (`org.apache.hadoop.mapreduce`).
 
@@ -378,7 +378,7 @@ Apart from text files, Spark's Scala API also supports 
several other data format
 
 <div data-lang="java"  markdown="1">
 
-Spark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
+Spark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
 
 Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes a URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
 
@@ -400,7 +400,7 @@ Apart from text files, Spark's Java API also supports 
several other data formats
 
 * `JavaSparkContext.wholeTextFiles` lets you read a directory containing 
multiple small text files, and returns each of them as (filename, content) 
pairs. This is in contrast with `textFile`, which would return one record per 
line in each file.
 
-* For 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 use SparkContext's `sequenceFile[K, V]` method where `K` and `V` are the types 
of key and values in the file. These should be subclasses of Hadoop's 
[Writable](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/Writable.html)
 interface, like 
[IntWritable](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/IntWritable.html)
 an [...]
+* For 
[SequenceFiles](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 use SparkContext's `sequenceFile[K, V]` method where `K` and `V` are the types 
of key and values in the file. These should be subclasses of Hadoop's 
[Writable](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/Writable.html)
 interface, like 
[IntWritable](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/IntWritable.html)
 and [Text](https://hado [...]
 
 * For other Hadoop InputFormats, you can use the `JavaSparkContext.hadoopRDD` 
method, which takes an arbitrary `JobConf` and input format class, key class 
and value class. Set these the same way you would for a Hadoop job with your 
input source. You can also use `JavaSparkContext.newAPIHadoopRDD` for 
InputFormats based on the "new" MapReduce API (`org.apache.hadoop.mapreduce`).
 
@@ -410,7 +410,7 @@ Apart from text files, Spark's Java API also supports 
several other data formats
 
 <div data-lang="python"  markdown="1">
 
-PySpark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
+PySpark can create distributed datasets from any storage source supported by 
Hadoop, including your local file system, HDFS, Cassandra, HBase, [Amazon 
S3](http://wiki.apache.org/hadoop/AmazonS3), etc. Spark supports text files, 
[SequenceFiles](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/SequenceFileInputFormat.html),
 and any other Hadoop 
[InputFormat](http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/InputFormat.html).
 
 Text file RDDs can be created using `SparkContext`'s `textFile` method. This 
method takes a URI for the file (either a local path on the machine, or a 
`hdfs://`, `s3a://`, etc URI) and reads it as a collection of lines. Here is an 
example invocation:
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch master updated: [MINOR][DOCS] Fix some broken links in docs

Reply via email to