> (2) If the method forces evaluation this matches most obvious way that would implemented then we should add it with a note in the docstring
I am not sure about this because force evaluation could be something that has side effect. For example, df.count() can realize a cache and if we implement __len__ to call df.count() then len(df) would end up populating some cache and can be unintuitive. On Fri, Oct 26, 2018 at 1:21 PM Leif Walsh <leif.wa...@gmail.com> wrote: > That all sounds reasonable but I think in the case of 4 and maybe also 3 I > would rather see it implemented to raise an error message that explains > what’s going on and suggests the explicit operation that would do the most > equivalent thing. And perhaps raise a warning (using the warnings module) > for things that might be unintuitively expensive. > On Fri, Oct 26, 2018 at 12:15 Holden Karau <hol...@pigscanfly.ca> wrote: > >> Coming out of https://github.com/apache/spark/pull/21654 it was agreed >> the helper methods in question made sense but there was some desire for a >> plan as to which helper methods we should use. >> >> I'd like to purpose a light weight solution to start with for helper >> methods that match either Pandas or general Python collection helper >> methods: >> 1) If the helper method doesn't collect the DataFrame back or force >> evaluation to the driver then we should add it without discussion >> 2) If the method forces evaluation this matches most obvious way that >> would implemented then we should add it with a note in the docstring >> 3) If the method does collect the DataFrame back to the driver and that >> is the most obvious way it would implemented (e.g. calling list to get back >> a list would have to collect the DataFrame) then we should add it with a >> warning in the docstring >> 4) If the method collects the DataFrame but a reasonable Python developer >> wouldn't expect that behaviour not implementing the helper method would be >> better >> >> What do folks think? >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> > -- > -- > Cheers, > Leif >