[ 
https://issues.apache.org/jira/browse/SPARK-57574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57574:
-----------------------------
    Shepherd: Max Gekk

> Support the TIME data type in pandas API on Spark
> -------------------------------------------------
>
>                 Key: SPARK-57574
>                 URL: https://issues.apache.org/jira/browse/SPARK-57574
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Pandas API on Spark
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Major
>
> h2. What
> Support {{TimeType}} columns in pandas API on Spark (pyspark.pandas): dtype 
> mapping, Series/Index
> support, and basic operations, analogous to Date/Timestamp.
> h2. Gap
> pyspark.pandas largely does not handle {{TimeType}} (only a stray reference 
> in base.py); the
> Spark <-> pandas dtype machinery treats datetime.time as a generic object.
> h2. Scope
> * Map {{TimeType}} to an appropriate pandas dtype (object holding 
> datetime.time, or a dedicated
>   extension dtype) in the internal type machinery.
> * Support to_pandas / from_pandas and Arrow conversion round-trips (the 
> underlying Arrow
>   conversion is already done: SPARK-53263 / SPARK-53305).
> * Cover Series/Index creation, dtype reporting, and basic operations.
> h2. Acceptance criteria
> * A pandas-on-Spark Series/DataFrame with TIME round-trips to/from pandas and 
> reports a stable
>   dtype.
> * Tests added under python/pyspark/pandas/tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to