Neville Kadwa created SPARK-15063:
-------------------------------------

             Summary: filtering and joining back doesn't work
                 Key: SPARK-15063
                 URL: https://issues.apache.org/jira/browse/SPARK-15063
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.1
            Reporter: Neville Kadwa


I'm trying to filter and join to do a simple pivot but getting very odd results.

{quote} {noformat}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

val people = Array((1, "sam"), (2, "joe"), (3, "sally"), (4, "joanna"))
val accounts = Array(
  (1, "checking", 100.0),
  (1, "savings", 300.0),
  (2, "savings", 1000.0),
  (3, "carloan", 12000.0),
  (3, "checking", 400.0)
)

val t1 = sc.makeRDD(people).toDF("uid", "name")
val t2 = sc.makeRDD(accounts).toDF("uid", "type", "amount")

val t2c = t2.filter(t2("type") <=> "checking")
val t2s = t2.filter(t2("type") <=> "savings")

t1.
  join(t2c, t1("uid") <=> t2c("uid"), "left").
  join(t2s, t1("uid") <=> t2s("uid"), "left").
  take(10)
{noformat} {quote}
The results are wrong:

{quote} {noformat}
Array(
  [1,sam,1,checking,100.0,1,savings,300.0],
  [1,sam,1,checking,100.0,2,savings,1000.0],
  [2,joe,null,null,null,null,null,null],
  [3,sally,3,checking,400.0,1,savings,300.0],
  [3,sally,3,checking,400.0,2,savings,1000.0],
  [4,joanna,null,null,null,null,null,null]
)
{noformat} {quote}
The way I can force it to work properly is to create a new df for each filter:

{quote} {noformat}
val t2a = sc.makeRDD(accounts).toDF("uid", "type", "amount")
val t2s = t2a.filter(t2a("type") <=> "savings")

t1.
  join(t2c, t1("uid") <=> t2c("uid"), "left").
  join(t2s, t1("uid") <=> t2s("uid"), "left").
  take(10)
{noformat} {quote}
The results are right:

{quote} {noformat}
Array(
  [1,sam,1,checking,100.0,1,savings,300.0],
  [2,joe,null,null,null,2,savings,1000.0],
  [3,sally,3,checking,400.0,null,null,null],
  [4,joanna,null,null,null,null,null,null]
)
{noformat} {quote}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to