[
https://issues.apache.org/jira/browse/SPARK-24671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526023#comment-16526023
]
Apache Spark commented on SPARK-24671:
--------------------------------------
User 'kokes' has created a pull request for this issue:
https://github.com/apache/spark/pull/21654
> DataFrame length using a dunder/magic method in PySpark
> -------------------------------------------------------
>
> Key: SPARK-24671
> URL: https://issues.apache.org/jira/browse/SPARK-24671
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 2.3.1
> Reporter: Ondrej Kokes
> Priority: Minor
>
> In Python, if a class implements a method called __len__, one can use the
> builtin `len` function to get a length of an instance of said class, whatever
> that means in its context. This is e.g. how you get the number of rows of a
> pandas DataFrame.
> It should be straightforward to add this functionality to PySpark, because
> df.count() is already implemented, so the patch I'm proposing is just two
> lines of code (and two lines of tests). It's in this commit, I'll submit a PR
> shortly.
> https://github.com/kokes/spark/commit/4d0afaf3cd046b11e8bae43dc00ddf4b1eb97732
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]