Jamie Hutton created SPARK-17871:
------------------------------------
Summary: Dataset joinwith syntax should support specifying the
condition in a compile-time safe way
Key: SPARK-17871
URL: https://issues.apache.org/jira/browse/SPARK-17871
Project: Spark
Issue Type: Improvement
Components: Spark Core
Affects Versions: 2.0.1
Reporter: Jamie Hutton
One of the great things about datasets is the way it enables compile time
type-safety. However the joinWith method currently only supports a "condition"
which is specified as a column, meaning we have to reference the columns by
name, removing compile time checking and leading to errors
It would be great if spark could support a join method which allowed
compile-time checking of the condition. Something like:
leftDS.joinWith(rightDS, case(l,r)=>l.id=r.id"))
This would have the added benefit of solving a serialization issue which stops
joinwith working when using kryo (because with kryo we just one one column on
binary called "value" representing the entire object, and we need to be able to
join on items within the object - more info here:
http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-dataset).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]