[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat
Github user bersprockets commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r181943238 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -115,6 +116,62 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns the union of all the given maps. + */ +@ExpressionDescription( +usage = "_FUNC_(map, ...) - Returns the union of all the given maps", +examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')); + [[1 -> "a"], [2 -> "c"], [3 -> "d"] + """) +case class MapConcat(children: Seq[Expression]) extends Expression + with CodegenFallback { + + override def checkInputDataTypes(): TypeCheckResult = { +// this check currently does not allow valueContainsNull to vary, +// and unfortunately none of the MapType toString methods include +// valueContainsNull for the error message +if (children.exists(!_.dataType.isInstanceOf[MapType])) { + TypeCheckResult.TypeCheckFailure( +s"The given input of function $prettyName should all be of type map, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else if (children.map(_.dataType).distinct.length > 1) { + TypeCheckResult.TypeCheckFailure( +s"The given input maps of function $prettyName should all be the same type, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else { + TypeCheckResult.TypeCheckSuccess +} + } + override def dataType: MapType = { +children.headOption.map(_.dataType.asInstanceOf[MapType]) + .getOrElse(MapType(keyType = StringType, valueType = StringType)) + } + + override def nullable: Boolean = false --- End diff -- @henryr empty map: scala> df.select(map_concat('map1, 'map2).as('newMap)).show df.select(map_concat('map1, 'map2).as('newMap)).show +--+ |newMap| +--+ |[]| |[]| +--+ Presto docs (from which the proposed spec comes) are quiet on the matter. Even after looking at the Presto code, I am still hard-pressed to say. I did divine from the Presto code that there should be at least two inputs (and I don't currently verify that). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat
Github user henryr commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r181915827 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -115,6 +116,62 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns the union of all the given maps. + */ +@ExpressionDescription( +usage = "_FUNC_(map, ...) - Returns the union of all the given maps", +examples = """ +Examples: + > SELECT _FUNC_(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')); + [[1 -> "a"], [2 -> "c"], [3 -> "d"] + """) +case class MapConcat(children: Seq[Expression]) extends Expression + with CodegenFallback { + + override def checkInputDataTypes(): TypeCheckResult = { +// this check currently does not allow valueContainsNull to vary, +// and unfortunately none of the MapType toString methods include +// valueContainsNull for the error message +if (children.exists(!_.dataType.isInstanceOf[MapType])) { + TypeCheckResult.TypeCheckFailure( +s"The given input of function $prettyName should all be of type map, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else if (children.map(_.dataType).distinct.length > 1) { + TypeCheckResult.TypeCheckFailure( +s"The given input maps of function $prettyName should all be the same type, " + + "but they are " + children.map(_.dataType.simpleString).mkString("[", ", ", "]")) +} else { + TypeCheckResult.TypeCheckSuccess +} + } + override def dataType: MapType = { +children.headOption.map(_.dataType.asInstanceOf[MapType]) + .getOrElse(MapType(keyType = StringType, valueType = StringType)) + } + + override def nullable: Boolean = false --- End diff -- What's the result of `map_concat(NULL, NULL)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL][WIP] Implement map_concat
GitHub user bersprockets opened a pull request: https://github.com/apache/spark/pull/21073 [SPARK-23936][SQL][WIP] Implement map_concat ## What changes were proposed in this pull request? Implement map_concat high order function. This is a work in progress. There's no code generation yet. My current implementation of MapConcat.checkInputDataTypes does not allow valueContainsNull to vary between the input maps. Not sure what the requirements are here. I am using a java.util.Map implementation to merge all the maps together, since that is a very straightforward implementation. I chose java.util.LinkedHashMap because: - It allows for predicatable tuple order (essentially, the original insertion order) for building the result MapData. Tests, like pyspark's doctests, which rely on tuple order, will work across Java versions - java.util.LinkedHashMap seems to be about as fast as java.util.HashMap, at least when used to concatenate big (500+ key/values) maps for 150k rows, and it's much faster than the scala LinkedHashMap implementation. ## How was this patch tested? New tests Manual tests Run all sbt SQL tests Run all pyspark sql tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/bersprockets/spark SPARK-23936 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21073.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21073 commit 707330cd88b269cb1bbee83b9b6476d05c8d177c Author: Bruce RobbinsDate: 2018-04-14T23:52:37Z Initial commit commit 84d696313972a237691eb46cad6a478167dbabee Author: Bruce Robbins Date: 2018-04-15T02:04:45Z Remove unused variable in test commit d04893bccbd2772eb937125895519228588e74b4 Author: Bruce Robbins Date: 2018-04-15T03:35:47Z Cleanup --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org