spark git commit: [SPARK-14463][SQL] Document the semantics for read.text

rxin Wed, 18 May 2016 19:17:22 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 760e7ac81 -> 595ed8de6



[SPARK-14463][SQL] Document the semantics for read.text

## What changes were proposed in this pull request?
This patch is a follow-up to https://github.com/apache/spark/pull/13104 and 
adds documentation to clarify the semantics of read.text with respect to 
partitioning.

## How was this patch tested?
N/A

Author: Reynold Xin <r...@databricks.com>

Closes #13184 from rxin/SPARK-14463.

(cherry picked from commit 4987f39ac7a694e1c8b8b82246eb4fbd863201c4)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/595ed8de
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/595ed8de
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/595ed8de

Branch: refs/heads/branch-2.0
Commit: 595ed8de60c2d0cfde4aaf8aafe44f734d26631a
Parents: 760e7ac
Author: Reynold Xin <r...@databricks.com>
Authored: Wed May 18 19:16:28 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Wed May 18 19:16:34 2016 -0700

----------------------------------------------------------------------
 R/pkg/R/SQLContext.R                                         | 2 ++
 python/pyspark/sql/readwriter.py                             | 3 +++
 .../main/scala/org/apache/spark/sql/DataFrameReader.scala    | 8 ++++++--
 3 files changed, 11 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/595ed8de/R/pkg/R/SQLContext.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/SQLContext.R b/R/pkg/R/SQLContext.R
index 3824e0a..6b7a341 100644
--- a/R/pkg/R/SQLContext.R
+++ b/R/pkg/R/SQLContext.R
@@ -298,6 +298,8 @@ parquetFile <- function(sqlContext, ...) {
 #' Create a SparkDataFrame from a text file.
 #'
 #' Loads a text file and returns a SparkDataFrame with a single string column 
named "value".
+#' If the directory structure of the text files contains partitioning 
information, those are
+#' ignored in the resulting DataFrame.
 #' Each line in the text file is a new row in the resulting SparkDataFrame.
 #'
 #' @param sqlContext SQLContext to use

http://git-wip-us.apache.org/repos/asf/spark/blob/595ed8de/python/pyspark/sql/readwriter.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index 8e6bce9..855c9d6 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -286,6 +286,9 @@ class DataFrameReader(object):
     @since(1.6)
     def text(self, paths):
         """Loads a text file and returns a [[DataFrame]] with a single string 
column named "value".
+        If the directory structure of the text files contains partitioning 
information,
+        those are ignored in the resulting DataFrame. To include partitioning 
information as
+        columns, use ``read.format('text').load(...)``.
 
         Each line in the text file is a new row in the resulting DataFrame.
 

http://git-wip-us.apache.org/repos/asf/spark/blob/595ed8de/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index e33fd83..57a2091 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -440,10 +440,14 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   }
 
   /**
-   * Loads a text file and returns a [[Dataset]] of String. The underlying 
schema of the Dataset
+   * Loads text files and returns a [[Dataset]] of String. The underlying 
schema of the Dataset
    * contains a single string column named "value".
    *
-   * Each line in the text file is a new row in the resulting Dataset. For 
example:
+   * If the directory structure of the text files contains partitioning 
information, those are
+   * ignored in the resulting Dataset. To include partitioning information as 
columns, use
+   * `read.format("text").load("...")`.
+   *
+   * Each line in the text files is a new element in the resulting Dataset. 
For example:
    * {{{
    *   // Scala:
    *   spark.read.text("/path/to/spark/README.md")


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-14463][SQL] Document the semantics for read.text

Reply via email to