[jira] [Created] (SPARK-32285) Add PySpark support for nested timestamps with arrow

Bryan Cutler (Jira) Sun, 12 Jul 2020 15:43:14 -0700

Bryan Cutler created SPARK-32285:
------------------------------------

             Summary: Add PySpark support for nested timestamps with arrow
                 Key: SPARK-32285
                 URL: https://issues.apache.org/jira/browse/SPARK-32285
             Project: Spark
          Issue Type: Sub-task
          Components: PySpark, SQL
    Affects Versions: 3.0.0
            Reporter: Bryan Cutler



Currently with arrow optimizations, there is post-processing done in pandas for 
timestamp columns to localize timezone. This is not done for nested columns 
with timestamps such as StructType or ArrayType.

Adding support for this is needed for Apache Arrow 1.0.0 upgrade due to use of 
structs with timestamps in groupedby key over a window.

As a simple first step, timestamps with 1 level nesting could be done first and 
this will satisfy the immediate need.

NOTE: with Arrow 1.0.0, it might be possible to do the timezone processing with 
pyarrow.array.cast, which could be easier done than in pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-32285) Add PySpark support for nested timestamps with arrow

Reply via email to