You can groupByKey and then cogroup. On Thu, Nov 10, 2016 at 10:44 AM, Yang <teddyyyy...@gmail.com> wrote:
> the new DataSet API is supposed to provide type safety and type checks at > compile time https://spark.apache.org/docs/latest/structured- > streaming-programming-guide.html#join-operations > > It does this indeed for a lot of places, but I found it still doesn't have > a type safe join: > > val ds1 = hc.sql("select col1, col2 from mytable") > > val ds2 = hc.sql("select col3 , col4 from mytable2") > > val ds3 = ds1.joinWith(ds2, ds1.col("col1") === ds2.col("col2")) > > here spark has no way to make sure (at compile time) that the two columns > being joined together > , "col1" and "col2" are of matching types. This is contrast to rdd join, > where it would be detected at compile time. > > am I missing something? > > thanks > >