Dear all,

I have three questions about equality of org.apache.spark.sql.Row.

(1) If a Row has a complex type (e.g. Array), is the following behavior 
expected?
If two Rows has the same array instance, Row.equals returns true in the 
second assert. If two Rows has different array instances (a1 and a2) that 
have the same array elements, Row.equals returns false in the third 
assert.

val a1 = Array(3, 4)
val a2 = Array(3, 4)
val r1 = Row(a1)
val r2 = Row(a2)
assert(a1.sameElements(a2)) // SUCCESS
assert(Row(a1).equals(Row(a1)))  // SUCCESS
assert(Row(a1).equals(Row(a2)))  // FAILURE

This is because two objects are compared by "o1 != o2" instead of 
"o1.equals(o2)" at 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala#L408

(2) If (1) is expected, where is this behavior is described or defined? I 
cannot find the description in the API document.
https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/Row.html
https://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.Row

(3) If (3) is expected, is there any recommendation to write code of 
equality of two Rows that have an Array or complex types (e.g. Map)?

Best Regards,
Kazuaki Ishizaki, @kiszk

Reply via email to