[
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17292987#comment-17292987
]
Rafal Wojdyla commented on SPARK-34544:
---------------------------------------
👋 [~zero323]
> it is more a dev utility than user a facing feature.
We use mypy to type check our codebase, and we hit this issue as users, for an
example of an issue see SPARK-34540 (which is just one case). Btw I could not
find any documentation for the pyspark typing contributions (like in what cases
new symbols should be added to public protocols, and why protocols are
incomplete etc), I probably missed it, could you please point me towards it?
> As far as I am aware removing it doesn't resolve any of the problems
> described here
Removing the {{DataFrameLike}} as the return type of the {{toPandas}}, would
make mypy stop shouting about missing symbols (which are not part of
{{DataFrameLike}}, but are in fact valid methods of pandas' {{DataFrame}}).
This is obviously suboptimal since then it just becomes {{Any}}. An alternative
is to add the missing symbols to the {{DataFrameLike}} as in SPARK-34540. But
until pyspark release, how would we monkey patch that change in our projects?
So in the end it sounds like we have a bunch of suboptimal ideas, how should we
proceed?
> pyspark toPandas() should return pd.DataFrame
> ---------------------------------------------
>
> Key: SPARK-34544
> URL: https://issues.apache.org/jira/browse/SPARK-34544
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.1.1
> Reporter: Rafal Wojdyla
> Assignee: Maciej Szymkiewicz
> Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that
> certain pandas methods are not present in {{DataFrameLike}}, even tho those
> methods are valid methods on pandas {{DataFrame}}, which is the actual type
> of the object. This requires type ignore comments or asserts.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]