> (2) If the method forces evaluation this matches most obvious way that
would implemented then we should add it with a note in the docstring

I am not sure about this because force evaluation could be something that
has side effect. For example, df.count() can realize a cache and if we
implement __len__ to call df.count() then len(df) would end up populating
some cache and can be unintuitive.

On Fri, Oct 26, 2018 at 1:21 PM Leif Walsh <leif.wa...@gmail.com> wrote:

> That all sounds reasonable but I think in the case of 4 and maybe also 3 I
> would rather see it implemented to raise an error message that explains
> what’s going on and suggests the explicit operation that would do the most
> equivalent thing. And perhaps raise a warning (using the warnings module)
> for things that might be unintuitively expensive.
> On Fri, Oct 26, 2018 at 12:15 Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> Coming out of https://github.com/apache/spark/pull/21654 it was agreed
>> the helper methods in question made sense but there was some desire for a
>> plan as to which helper methods we should use.
>>
>> I'd like to purpose a light weight solution to start with for helper
>> methods that match either Pandas or general Python collection helper
>> methods:
>> 1) If the helper method doesn't collect the DataFrame back or force
>> evaluation to the driver then we should add it without discussion
>> 2) If the method forces evaluation this matches most obvious way that
>> would implemented then we should add it with a note in the docstring
>> 3) If the method does collect the DataFrame back to the driver and that
>> is the most obvious way it would implemented (e.g. calling list to get back
>> a list would have to collect the DataFrame) then we should add it with a
>> warning in the docstring
>> 4) If the method collects the DataFrame but a reasonable Python developer
>> wouldn't expect that behaviour not implementing the helper method would be
>> better
>>
>> What do folks think?
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
> --
> Cheers,
> Leif
>

Reply via email to