Repository: spark Updated Branches: refs/heads/master 1842cdd4e -> 688b6ef9d
[SPARK-15932][SQL][DOC] document the contract of encoder serializer expressions ## What changes were proposed in this pull request? In our encoder framework, we imply that serializer expressions should use `BoundReference` to refer to the input object, and a lot of codes depend on this contract(e.g. ExpressionEncoder.tuple). This PR adds some document and assert in `ExpressionEncoder` to make it clearer. ## How was this patch tested? existing tests Author: Wenchen Fan <wenc...@databricks.com> Closes #13648 from cloud-fan/comment. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/688b6ef9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/688b6ef9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/688b6ef9 Branch: refs/heads/master Commit: 688b6ef9dc0943d268fab7279ef50bfac1617f04 Parents: 1842cdd Author: Wenchen Fan <wenc...@databricks.com> Authored: Mon Jun 13 22:02:23 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Mon Jun 13 22:02:23 2016 -0700 ---------------------------------------------------------------------- .../spark/sql/catalyst/encoders/ExpressionEncoder.scala | 9 +++++++++ 1 file changed, 9 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/688b6ef9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala index 688082d..0023ce6 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala @@ -197,6 +197,15 @@ case class ExpressionEncoder[T]( if (flat) require(serializer.size == 1) + // serializer expressions are used to encode an object to a row, while the object is usually an + // intermediate value produced inside an operator, not from the output of the child operator. This + // is quite different from normal expressions, and `AttributeReference` doesn't work here + // (intermediate value is not an attribute). We assume that all serializer expressions use a same + // `BoundReference` to refer to the object, and throw exception if they don't. + assert(serializer.forall(_.references.isEmpty), "serializer cannot reference to any attributes.") + assert(serializer.flatMap(_.collect { case b: BoundReference => b}).distinct.length <= 1, + "all serializer expressions must use the same BoundReference.") + /** * Returns a new copy of this encoder, where the `deserializer` is resolved and bound to the * given schema. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org