[ https://issues.apache.org/jira/browse/SPARK-53053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Nguyen updated SPARK-53053: --------------------------------- Description: Original pandas supports properties like _constructor that are can be used to easily override what datatype is returned by default when downstream libraries are built to inherit from pandas classes (e.g Dask). By override the _constructor method, all functions like head() and getitem return the object using that type instead of a normal Pandas dataframe. To implement this, we would have to manually modify functions like self.head() to return self._constructor(result) instead of DataFrame(result) {code:java} class SubclassedDataFrame(pd.DataFrame): @property def _constructor(self): return SubclassedDataFrame @property def _constructor_sliced(self): return SubclassedSeries {code} [https://pandas.pydata.org/docs/development/extending.html#override-constructor-properties] was: Original pandas supports properties like _constructor that are can be used to easily override what datatype is returned by default when downstream libraries are built to inherit from pandas classes (e.g Dask). By override the _constructor method, def _constructor(self): return MyDataFrame all functions like head() and getitem return the object using that type (MyDataFrame) instead of a normal Pandas dataframe. To implement this, we would have to manually modify functions like self.head() to return self._constructor(result) instead of DataFrame(result) https://pandas.pydata.org/docs/development/extending.html#override-constructor-properties > Support Pandas Extension Properties > ----------------------------------- > > Key: SPARK-53053 > URL: https://issues.apache.org/jira/browse/SPARK-53053 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark > Affects Versions: 4.1.0 > Reporter: Peter Nguyen > Priority: Major > > Original pandas supports properties like _constructor that are can be used to > easily override what datatype is returned by default when downstream > libraries are built to inherit from pandas classes (e.g Dask). By override > the _constructor method, all functions like head() and getitem return the > object using that type instead of a normal Pandas dataframe. To implement > this, we would have to manually modify functions like self.head() to return > self._constructor(result) instead of DataFrame(result) > {code:java} > class SubclassedDataFrame(pd.DataFrame): > @property > def _constructor(self): > return SubclassedDataFrame > @property > def _constructor_sliced(self): > return SubclassedSeries {code} > [https://pandas.pydata.org/docs/development/extending.html#override-constructor-properties] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org