spark git commit: [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API

rxin Mon, 01 Jun 2015 21:29:59 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 d542a35ad -> 3af4c0b4e



[minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem 
API

Author: Reynold Xin <[email protected]>

Closes #6569 from rxin/freqItemsWarning and squashes the following commits:

7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for 
DataFrame.stat.freqItem API.

(cherry picked from commit 4c868b9943a2d86107d1f15f8df9830aac36fb75)
Signed-off-by: Reynold Xin <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3af4c0b4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3af4c0b4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3af4c0b4

Branch: refs/heads/branch-1.4
Commit: 3af4c0b4e86dee523cfc535c44f441cddd2337cc
Parents: d542a35
Author: Reynold Xin <[email protected]>
Authored: Mon Jun 1 21:29:39 2015 -0700
Committer: Reynold Xin <[email protected]>
Committed: Mon Jun 1 21:29:46 2015 -0700

----------------------------------------------------------------------
 python/pyspark/sql/dataframe.py                         |  3 +++
 .../org/apache/spark/sql/DataFrameStatFunctions.scala   | 12 ++++++++++++
 2 files changed, 15 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/3af4c0b4/python/pyspark/sql/dataframe.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 9364875..a82b6b8 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -1170,6 +1170,9 @@ class DataFrame(object):
         "http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, 
and Papadimitriou".
         :func:`DataFrame.freqItems` and 
:func:`DataFrameStatFunctions.freqItems` are aliases.
 
+        This function is meant for exploratory data analysis, as we make no 
guarantee about the
+        backward compatibility of the schema of the resulting DataFrame.
+
         :param cols: Names of the columns to calculate frequent items for as a 
list or tuple of
             strings.
         :param support: The frequency with which to consider an item 
'frequent'. Default is 1%.

http://git-wip-us.apache.org/repos/asf/spark/blob/3af4c0b4/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
----------------------------------------------------------------------
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index b624eaa..edb9ed7 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -97,6 +97,9 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
    * [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, 
and Papadimitriou]].
    * The `support` should be greater than 1e-4.
    *
+   * This function is meant for exploratory data analysis, as we make no 
guarantee about the
+   * backward compatibility of the schema of the resulting [[DataFrame]].
+   *
    * @param cols the names of the columns to search frequent items in.
    * @param support The minimum frequency for an item to be considered 
`frequent`. Should be greater
    *                than 1e-4.
@@ -114,6 +117,9 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
    * [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, 
and Papadimitriou]].
    * Uses a `default` support of 1%.
    *
+   * This function is meant for exploratory data analysis, as we make no 
guarantee about the
+   * backward compatibility of the schema of the resulting [[DataFrame]].
+   *
    * @param cols the names of the columns to search frequent items in.
    * @return A Local DataFrame with the Array of frequent items for each 
column.
    *
@@ -128,6 +134,9 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
    * frequent element count algorithm described in
    * [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, 
and Papadimitriou]].
    *
+   * This function is meant for exploratory data analysis, as we make no 
guarantee about the
+   * backward compatibility of the schema of the resulting [[DataFrame]].
+   *
    * @param cols the names of the columns to search frequent items in.
    * @return A Local DataFrame with the Array of frequent items for each 
column.
    *
@@ -143,6 +152,9 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
    * [[http://dx.doi.org/10.1145/762471.762473, proposed by Karp, Schenker, 
and Papadimitriou]].
    * Uses a `default` support of 1%.
    *
+   * This function is meant for exploratory data analysis, as we make no 
guarantee about the
+   * backward compatibility of the schema of the resulting [[DataFrame]].
+   *
    * @param cols the names of the columns to search frequent items in.
    * @return A Local DataFrame with the Array of frequent items for each 
column.
    *


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API

Reply via email to