[
https://issues.apache.org/jira/browse/SPARK-57574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-57574:
-----------------------------------
Labels: pull-request-available (was: )
> Support the TIME data type in pandas API on Spark
> -------------------------------------------------
>
> Key: SPARK-57574
> URL: https://issues.apache.org/jira/browse/SPARK-57574
> Project: Spark
> Issue Type: Sub-task
> Components: Pandas API on Spark
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Major
> Labels: pull-request-available
>
> h2. What
> Support {{TimeType}} columns in pandas API on Spark (pyspark.pandas): dtype
> mapping, Series/Index
> support, and basic operations, analogous to Date/Timestamp.
> h2. Gap
> pyspark.pandas largely does not handle {{TimeType}} (only a stray reference
> in base.py); the
> Spark <-> pandas dtype machinery treats datetime.time as a generic object.
> h2. Scope
> * Map {{TimeType}} to an appropriate pandas dtype (object holding
> datetime.time, or a dedicated
> extension dtype) in the internal type machinery.
> * Support to_pandas / from_pandas and Arrow conversion round-trips (the
> underlying Arrow
> conversion is already done: SPARK-53263 / SPARK-53305).
> * Cover Series/Index creation, dtype reporting, and basic operations.
> h2. Acceptance criteria
> * A pandas-on-Spark Series/DataFrame with TIME round-trips to/from pandas and
> reports a stable
> dtype.
> * Tests added under python/pyspark/pandas/tests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]