spark git commit: [SPARK-15533][SQL] Deprecate Dataset.explode

rxin Wed, 25 May 2016 19:11:10 -0700

Repository: spark
Updated Branches:
  refs/heads/master 527499b62 -> 06ed1fa3e



[SPARK-15533][SQL] Deprecate Dataset.explode

## What changes were proposed in this pull request?

This patch deprecates `Dataset.explode` and documents appropriate workarounds 
to use `flatMap()` or `functions.explode()` instead.

## How was this patch tested?

N/A

Author: Sameer Agarwal <[email protected]>

Closes #13312 from sameeragarwal/deprecate.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/06ed1fa3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/06ed1fa3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/06ed1fa3

Branch: refs/heads/master
Commit: 06ed1fa3e45adfc11b0f615cb8b97c99fadc735f
Parents: 527499b
Author: Sameer Agarwal <[email protected]>
Authored: Wed May 25 19:10:57 2016 -0700
Committer: Reynold Xin <[email protected]>
Committed: Wed May 25 19:10:57 2016 -0700

----------------------------------------------------------------------
 .../scala/org/apache/spark/sql/Dataset.scala    | 33 +++++++++++++-------
 1 file changed, 22 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/06ed1fa3/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
index 78a167e..e5140fc 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
@@ -1556,30 +1556,33 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * :: Experimental ::
    * (Scala-specific) Returns a new [[Dataset]] where each row has been 
expanded to zero or more
    * rows by the provided function. This is similar to a `LATERAL VIEW` in 
HiveQL. The columns of
    * the input row are implicitly joined with each row that is output by the 
function.
    *
-   * The following example uses this function to count the number of books 
which contain
-   * a given word:
+   * Given that this is deprecated, as an alternative, you can explode columns 
either using
+   * `functions.explode()` or `flatMap()`. The following example uses these 
alternatives to count
+   * the number of books that contain a given word:
    *
    * {{{
    *   case class Book(title: String, words: String)
    *   val ds: Dataset[Book]
    *
-   *   case class Word(word: String)
-   *   val allWords = ds.explode('words) {
-   *     case Row(words: String) => words.split(" ").map(Word(_))
-   *   }
+   *   val allWords = ds.select('title, explode(split('words, " ")).as("word"))
    *
    *   val bookCountPerWord = 
allWords.groupBy("word").agg(countDistinct("title"))
    * }}}
    *
+   * Using `flatMap()` this can similarly be exploded as:
+   *
+   * {{{
+   *   ds.flatMap(_.words.split(" "))
+   * }}}
+   *
    * @group untypedrel
    * @since 2.0.0
    */
-  @Experimental
+  @deprecated("use flatMap() or select() with functions.explode() instead", 
"2.0.0")
   def explode[A <: Product : TypeTag](input: Column*)(f: Row => 
TraversableOnce[A]): DataFrame = {
     val elementSchema = 
ScalaReflection.schemaFor[A].dataType.asInstanceOf[StructType]
 
@@ -1596,19 +1599,27 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * :: Experimental ::
    * (Scala-specific) Returns a new [[Dataset]] where a single column has been 
expanded to zero
    * or more rows by the provided function. This is similar to a `LATERAL 
VIEW` in HiveQL. All
    * columns of the input row are implicitly joined with each value that is 
output by the function.
    *
+   * Given that this is deprecated, as an alternative, you can explode columns 
either using
+   * `functions.explode()`:
+   *
+   * {{{
+   *   ds.select(explode(split('words, " ")).as("word"))
+   * }}}
+   *
+   * or `flatMap()`:
+   *
    * {{{
-   *   ds.explode("words", "word") {words: String => words.split(" ")}
+   *   ds.flatMap(_.words.split(" "))
    * }}}
    *
    * @group untypedrel
    * @since 2.0.0
    */
-  @Experimental
+  @deprecated("use flatMap() or select() with functions.explode() instead", 
"2.0.0")
   def explode[A, B : TypeTag](inputColumn: String, outputColumn: String)(f: A 
=> TraversableOnce[B])
     : DataFrame = {
     val dataType = ScalaReflection.schemaFor[B].dataType


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-15533][SQL] Deprecate Dataset.explode

Reply via email to