spark git commit: [SPARK-8437] [DOCS] Corrected: Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

andrewor14 Tue, 30 Jun 2015 10:08:33 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 eab1d16a7 -> 255b2be94



[SPARK-8437] [DOCS] Corrected: Using directory path without wildcard for 
filename slow for large number of files with wholeTextFiles and binaryFiles

Note that 'dir/*' can be more efficient in some Hadoop FS implementations that 
'dir/' (now fixed scaladoc by using HTML entity for *)

Author: Sean Owen <[email protected]>

Closes #7126 from srowen/SPARK-8437.2 and squashes the following commits:

7bb45da [Sean Owen] Note that 'dir/*' can be more efficient in some Hadoop FS 
implementations that 'dir/' (now fixed scaladoc by using HTML entity for *)

(cherry picked from commit ada384b785c663392a0b69fad5bfe7a0a0584ee0)
Signed-off-by: Andrew Or <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/255b2be9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/255b2be9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/255b2be9

Branch: refs/heads/branch-1.4
Commit: 255b2be94bbd2b527175d8e7a5a2b89fecf8a835
Parents: eab1d16
Author: Sean Owen <[email protected]>
Authored: Tue Jun 30 10:07:26 2015 -0700
Committer: Andrew Or <[email protected]>
Committed: Tue Jun 30 10:07:34 2015 -0700

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/SparkContext.scala | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/255b2be9/core/src/main/scala/org/apache/spark/SparkContext.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index b4c0d4c..d499aba 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -824,7 +824,8 @@ class SparkContext(config: SparkConf) extends Logging with 
ExecutorAllocationCli
    * }}}
    *
    * @note Small files are preferred, large file is also allowable, but may 
cause bad performance.
-   *
+   * @note On some filesystems, `.../path/&#42;` can be a more efficient way 
to read all files
+   *       in a directory rather than `.../path/` or `.../path`
    * @param minPartitions A suggestion value of the minimal splitting number 
for input data.
    */
   def wholeTextFiles(
@@ -871,9 +872,10 @@ class SparkContext(config: SparkConf) extends Logging with 
ExecutorAllocationCli
    *   (a-hdfs-path/part-nnnnn, its content)
    * }}}
    *
-   * @param minPartitions A suggestion value of the minimal splitting number 
for input data.
-   *
    * @note Small files are preferred; very large files may cause bad 
performance.
+   * @note On some filesystems, `.../path/&#42;` can be a more efficient way 
to read all files
+   *       in a directory rather than `.../path/` or `.../path`
+   * @param minPartitions A suggestion value of the minimal splitting number 
for input data.
    */
   @Experimental
   def binaryFiles(


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-8437] [DOCS] Corrected: Using directory path without wildcard for filename slow for large number of files with wholeTextFiles and binaryFiles

Reply via email to