I'm working around it like this:
val testMapped2 = test1.rdd.map(t => t.copy(id = t.id + 1)).toDF.as[Test]
testMapped2.as("t1").joinWith(testMapped2.as("t2"), $"t1.id" === $"t2.id
").show
Switching from RDD, then mapping, then going back to DS seemed to avoid the
issue.
On Fri, May 27, 2016 at
i am glad to see this, i think we can into this as well (in 2.0.0-SNAPSHOT)
but i couldn't reproduce it nicely.
my observation was that joins of 2 datasets that were derived from the same
datasource gave this kind of trouble. i changed my datasource from val to
def (so it got created twice) as a
I tried master branch :
scala> val testMapped = test.map(t => t.copy(id = t.id + 1))
testMapped: org.apache.spark.sql.Dataset[Test] = [id: int]
scala> testMapped.as("t1").joinWith(testMapped.as("t2"), $"t1.id" === $"
t2.id").show
org.apache.spark.sql.AnalysisException: cannot resolve '`t1.id`'
Oops, screwed up my example. This is what it should be:
case class Test(id: Int)
val test = Seq(
Test(1),
Test(2),
Test(3)
).toDS
test.as("t1").joinWith(test.as("t2"), $"t1.id" === $"t2.id").show
val testMapped = test.map(t => t.copy(id = t.id + 1))
I figured it out the trigger. Turns out it wasn't because I loaded it from
the database, it was because the first thing I do after loading is to lower
case all the strings. After a Dataset has been mapped, the resulting
Dataset can't be self joined. Here's a test case that illustrates the issue:
I stand corrected. I just created a test table with a single int field to
test with and the Dataset loaded from that works with no issues. I'll see
if I can track down exactly what the difference might be.
On Fri, May 27, 2016 at 10:29 AM Tim Gautier wrote:
> I'm using
I'm using 1.6.1.
I'm not sure what good fake data would do since it doesn't seem to have
anything to do with the data itself. It has to do with how the Dataset was
created. Both datasets have exactly the same data in them, but the one
created from a sql query fails where the one created from a
Which release of Spark are you using ?
Is it possible to come up with fake data that shows what you described ?
Thanks
On Fri, May 27, 2016 at 8:24 AM, Tim Gautier wrote:
> Unfortunately I can't show exactly the data I'm using, but this is what
> I'm seeing:
>
> I have
Unfortunately I can't show exactly the data I'm using, but this is what I'm
seeing:
I have a case class 'Product' that represents a table in our database. I
load that data via sqlContext.read.format("jdbc").options(...).load.as[Product]
and register it in a temp table 'product'.
For testing, I