Which release of Spark are you using ? Is it possible to come up with fake data that shows what you described ?
Thanks On Fri, May 27, 2016 at 8:24 AM, Tim Gautier <tim.gaut...@gmail.com> wrote: > Unfortunately I can't show exactly the data I'm using, but this is what > I'm seeing: > > I have a case class 'Product' that represents a table in our database. I > load that data via > sqlContext.read.format("jdbc").options(...).load.as[Product] > and register it in a temp table 'product'. > > For testing, I created a new Dataset that has only 3 records in it: > > val ts = sqlContext.sql("select * from product where product_catalog_id in > (1, 2, 3)").as[Product] > > I also created another one using the same case class and data, but from a > sequence instead. > > val ds: Dataset[Product] = Seq( > Product(Some(1), ...), > Product(Some(2), ...), > Product(Some(3), ...) > ).toDS > > The spark shell tells me these are exactly the same type at this point, > but they don't behave the same. > > ts.as("ts1").joinWith(ts.as("ts2"), $"ts1.product_catalog_id" === > $"ts2.product_catalog_id") > ds.as("ds1").joinWith(ds.as("ds2"), $"ds1.product_catalog_id" === > $"ds2.product_catalog_id") > > Again, spark tells me these self joins return exactly the same type, but > when I do a .show on them, only the one created from a Seq works. The one > created by reading from the database throws this error: > > org.apache.spark.sql.AnalysisException: cannot resolve > 'ts1.product_catalog_id' given input columns: [..., product_catalog_id, > ...]; > > Is this a bug? Is there anyway to make the Dataset loaded from a table > behave like the one created from a sequence? > > Thanks, > Tim >