[
https://issues.apache.org/jira/browse/SPARK-25798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-25798:
------------------------------------
Assignee: (was: Apache Spark)
> Internally document type conversion between Pandas data and SQL types in
> Pandas UDFs
> ------------------------------------------------------------------------------------
>
> Key: SPARK-25798
> URL: https://issues.apache.org/jira/browse/SPARK-25798
> Project: Spark
> Issue Type: Sub-task
> Components: PySpark
> Affects Versions: 2.4.0
> Reporter: Hyukjin Kwon
> Priority: Minor
>
> Currently, UDF's type coercion is not cleanly defined. See also
> https://github.com/apache/spark/pull/22610 and
> https://github.com/apache/spark/pull/22610
> This JIRA targets to describe the type conversion logic internally. For
> instance:
> {code}
> #
> +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
> # noqa
> # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)|
> 1(int32)|
> 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01
> 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns,
> US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days
> 00:00:00(timedelta64[ns])| # noqa
> #
> +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
> # noqa
> # | boolean| True| True| True|
> True| True| True| True| True| True| X|
> False|
> False| False| X| X|
> False| # noqa
> # | tinyint| 1| 1| 1|
> 1| 1| X| X| X| X| X|
> X|
> X| 1| X| 0|
> X| # noqa
> # | smallint| 1| 1| 1|
> 1| 1| 1| X| X| X| X|
> X|
> X| 1| X| X|
> X| # noqa
> # | int| 1| 1| 1|
> 1| 1| 1| 1| X| X| X|
> X|
> X| 1| X| X|
> X| # noqa
> # | bigint| 1| 1| 1|
> 1| 1| 1| 1| 1| X| X|
> 0|
> 18000000000000| 1| X| X|
> X| # noqa
> # | string| u''|u'\x01'| u'\x01'|
> u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'|
> u'a'| X|
> X| u''| X| X|
> X| # noqa
> # | date| X| X|
> X|datetime.date(197...| X| X| X| X|
> X| X| datetime.date(197...|
> X| X| X| X|
> X| # noqa
> # | timestamp| X| X| X|
> X|datetime.datetime...| X| X| X| X| X|
> datetime.datetime...|
> datetime.datetime...| X| X| X|
> X| # noqa
> # | float| 1.0| 1.0| 1.0|
> 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X|
> X|
> X| 1.0| X| X|
> X| # noqa
> # | double| 1.0| 1.0| 1.0|
> 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X|
> X|
> X| 1.0| X| X|
> X| # noqa
> # | array<int>| X| X| X|
> X| X| X| X| X| X| X|
> X|
> X| X| [1, 2, 3]| X|
> X| # noqa
> # | binary| X| X| X|
> X| X| X| X| X| X| X|
> X|
> X| X| X| X|
> X| # noqa
> # | decimal(10,0)| X| X| X|
> X| X| X| X| X| X| X|
> X|
> X| X| X| X|
> X| # noqa
> # | map<string,int>| X| X| X|
> X| X| X| X| X| X| X|
> X|
> X| X| X| X|
> X| # noqa
> # | struct<_1:int>| X| X| X|
> X| X| X| X| X| X| X|
> X|
> X| X| X| X|
> X| # noqa
> #
> +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
> # noqa
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
