Jamie Hutton created SPARK-17871:
------------------------------------

             Summary: Dataset joinwith syntax should support specifying the 
condition in a compile-time safe way
                 Key: SPARK-17871
                 URL: https://issues.apache.org/jira/browse/SPARK-17871
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.0.1
            Reporter: Jamie Hutton


One of the great things about datasets is the way it enables compile time 
type-safety. However the joinWith method currently only supports a "condition" 
which is specified as a column, meaning we have to reference the columns by 
name, removing compile time checking and leading to errors

It would be great if spark could support a join method which allowed 
compile-time checking of the condition. Something like:

leftDS.joinWith(rightDS, case(l,r)=>l.id=r.id"))

This would have the added benefit of solving a serialization issue which stops 
joinwith working when using kryo (because with kryo we just one one column on 
binary called "value" representing the entire object, and we need to be able to 
join on items within the object - more info here: 
http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset).
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to