Hi all, I'm getting some odd behavior when using the joinWith functionality
for Datasets. Here is a small test case:

  val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS()
  val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS()

  val joined = left.toDF("k", "v").as[(String, Int)].alias("left")
    .joinWith(right.toDF("k", "u").as[(String, String)].alias("right"),
functions.col("left.k") === functions.col("right.k"), "right_outer")
    .as[((String, Int), (String, String))]
    .map { case ((k, v), (_, u)) => (k, (v, u)) }.as[(String, (Int,
String))]

I would expect the result of this right-join to be:

  (a,(1,x))
  (a,(2,x))
  (b,(3,y))
  (d,(null,z))

but instead I'm getting:

  (a,(1,x))
  (a,(2,x))
  (b,(3,y))
  (null,(-1,z))

Not that the key for the final tuple is null instead of "d". (Also, is
there a reason the value for the left-side of the last tuple is -1 and not
null?)

-Andy

Reply via email to