Re: Dataset API inconsistencies

2018-01-10 Thread Michael Armbrust
I wrote Datasets, and I'll say I only use them when I really need to (i.e. when it would be very cumbersome to express what I am trying to do relationally). Dataset operations are almost always going to be slower than their DataFrame equivalents since they usually require materializing objects

Dataset API inconsistencies

2018-01-09 Thread Alex Nastetsky
I am finding using the Dataset API to be very cumbersome to use, which is unfortunate, as I was looking forward to the type-safety after coming from a Dataframe codebase. This link summarizes my troubles: http://loicdescotte. github.io/posts/spark2-datasets-type-safety/ The problem is having to