Re: I'm pretty sure this is a Dataset bug

Ted Yu Fri, 27 May 2016 09:16:09 -0700

Which release of Spark are you using ?

Is it possible to come up with fake data that shows what you described ?


Thanks

On Fri, May 27, 2016 at 8:24 AM, Tim Gautier <tim.gaut...@gmail.com> wrote:

> Unfortunately I can't show exactly the data I'm using, but this is what
> I'm seeing:
>
> I have a case class 'Product' that represents a table in our database. I
> load that data via 
> sqlContext.read.format("jdbc").options(...).load.as[Product]
> and register it in a temp table 'product'.
>
> For testing, I created a new Dataset that has only 3 records in it:
>
> val ts = sqlContext.sql("select * from product where product_catalog_id in
> (1, 2, 3)").as[Product]
>
> I also created another one using the same case class and data, but from a
> sequence instead.
>
> val ds: Dataset[Product] = Seq(
>       Product(Some(1), ...),
>       Product(Some(2), ...),
>       Product(Some(3), ...)
>     ).toDS
>
> The spark shell tells me these are exactly the same type at this point,
> but they don't behave the same.
>
> ts.as("ts1").joinWith(ts.as("ts2"), $"ts1.product_catalog_id" ===
> $"ts2.product_catalog_id")
> ds.as("ds1").joinWith(ds.as("ds2"), $"ds1.product_catalog_id" ===
> $"ds2.product_catalog_id")
>
> Again, spark tells me these self joins return exactly the same type, but
> when I do a .show on them, only the one created from a Seq works. The one
> created by reading from the database throws this error:
>
> org.apache.spark.sql.AnalysisException: cannot resolve
> 'ts1.product_catalog_id' given input columns: [..., product_catalog_id,
> ...];
>
> Is this a bug? Is there anyway to make the Dataset loaded from a table
> behave like the one created from a sequence?
>
> Thanks,
> Tim
>

Re: I'm pretty sure this is a Dataset bug

Reply via email to