I wrote Datasets, and I'll say I only use them when I really need to (i.e.
when it would be very cumbersome to express what I am trying to do
relationally). Dataset operations are almost always going to be slower
than their DataFrame equivalents since they usually require materializing
objects
I am finding using the Dataset API to be very cumbersome to use, which is
unfortunate, as I was looking forward to the type-safety after coming from
a Dataframe codebase.
This link summarizes my troubles: http://loicdescotte.
github.io/posts/spark2-datasets-type-safety/
The problem is having to