[ 
https://issues.apache.org/jira/browse/SPARK-25798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-25798:
--------------------------------
    Description: 
Currently, UDF's type coercion is not cleanly defined. See also 
https://github.com/apache/spark/pull/20163 and 
https://github.com/apache/spark/pull/22610

This JIRA targets to describe the type conversion logic internally. For 
instance:

{code}
    # 
+----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
  # noqa
    # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)|            1(int32)| 
           1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 
00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, 
US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 
00:00:00(timedelta64[ns])|  # noqa
    # 
+----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
  # noqa
    # |               boolean|      True|   True|    True|                True| 
               True|    True|     True|     True|     True|        X|           
                   False|                                                False| 
      False|                     X|          X|                           
False|  # noqa
    # |               tinyint|         1|      1|       1|                   1| 
                  1|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          1|                     X|          0|                               
X|  # noqa
    # |              smallint|         1|      1|       1|                   1| 
                  1|       1|        X|        X|        X|        X|           
                       X|                                                    X| 
          1|                     X|          X|                               
X|  # noqa
    # |                   int|         1|      1|       1|                   1| 
                  1|       1|        1|        X|        X|        X|           
                       X|                                                    X| 
          1|                     X|          X|                               
X|  # noqa
    # |                bigint|         1|      1|       1|                   1| 
                  1|       1|        1|        1|        X|        X|           
                       0|                                       18000000000000| 
          1|                     X|          X|                               
X|  # noqa
    # |                string|       u''|u'\x01'| u'\x01'|             u'\x01'| 
            u'\x01'| u'\x01'|  u'\x01'|  u'\x01'|  u'\x01'|     u'a'|           
                       X|                                                    X| 
        u''|                     X|          X|                               
X|  # noqa
    # |                  date|         X|      X|       X|datetime.date(197...| 
                  X|       X|        X|        X|        X|        X|           
    datetime.date(197...|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |             timestamp|         X|      X|       X|                   
X|datetime.datetime...|       X|        X|        X|        X|        X|        
       datetime.datetime...|                                 
datetime.datetime...|           X|                     X|          X|           
                    X|  # noqa
    # |                 float|       1.0|    1.0|     1.0|                 1.0| 
                1.0|     1.0|      1.0|      1.0|      1.0|        X|           
                       X|                                                    X| 
        1.0|                     X|          X|                               
X|  # noqa
    # |                double|       1.0|    1.0|     1.0|                 1.0| 
                1.0|     1.0|      1.0|      1.0|      1.0|        X|           
                       X|                                                    X| 
        1.0|                     X|          X|                               
X|  # noqa
    # |            array<int>|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|             [1, 2, 3]|          X|                               
X|  # noqa
    # |                binary|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |         decimal(10,0)|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |       map<string,int>|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |        struct<_1:int>|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # 
+----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
  # noqa
{code}

  was:
Currently, UDF's type coercion is not cleanly defined. See also 
https://github.com/apache/spark/pull/22610 and 
https://github.com/apache/spark/pull/22610

This JIRA targets to describe the type conversion logic internally. For 
instance:

{code}
    # 
+----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
  # noqa
    # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)|            1(int32)| 
           1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 
00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, 
US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 
00:00:00(timedelta64[ns])|  # noqa
    # 
+----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
  # noqa
    # |               boolean|      True|   True|    True|                True| 
               True|    True|     True|     True|     True|        X|           
                   False|                                                False| 
      False|                     X|          X|                           
False|  # noqa
    # |               tinyint|         1|      1|       1|                   1| 
                  1|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          1|                     X|          0|                               
X|  # noqa
    # |              smallint|         1|      1|       1|                   1| 
                  1|       1|        X|        X|        X|        X|           
                       X|                                                    X| 
          1|                     X|          X|                               
X|  # noqa
    # |                   int|         1|      1|       1|                   1| 
                  1|       1|        1|        X|        X|        X|           
                       X|                                                    X| 
          1|                     X|          X|                               
X|  # noqa
    # |                bigint|         1|      1|       1|                   1| 
                  1|       1|        1|        1|        X|        X|           
                       0|                                       18000000000000| 
          1|                     X|          X|                               
X|  # noqa
    # |                string|       u''|u'\x01'| u'\x01'|             u'\x01'| 
            u'\x01'| u'\x01'|  u'\x01'|  u'\x01'|  u'\x01'|     u'a'|           
                       X|                                                    X| 
        u''|                     X|          X|                               
X|  # noqa
    # |                  date|         X|      X|       X|datetime.date(197...| 
                  X|       X|        X|        X|        X|        X|           
    datetime.date(197...|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |             timestamp|         X|      X|       X|                   
X|datetime.datetime...|       X|        X|        X|        X|        X|        
       datetime.datetime...|                                 
datetime.datetime...|           X|                     X|          X|           
                    X|  # noqa
    # |                 float|       1.0|    1.0|     1.0|                 1.0| 
                1.0|     1.0|      1.0|      1.0|      1.0|        X|           
                       X|                                                    X| 
        1.0|                     X|          X|                               
X|  # noqa
    # |                double|       1.0|    1.0|     1.0|                 1.0| 
                1.0|     1.0|      1.0|      1.0|      1.0|        X|           
                       X|                                                    X| 
        1.0|                     X|          X|                               
X|  # noqa
    # |            array<int>|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|             [1, 2, 3]|          X|                               
X|  # noqa
    # |                binary|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |         decimal(10,0)|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |       map<string,int>|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # |        struct<_1:int>|         X|      X|       X|                   X| 
                  X|       X|        X|        X|        X|        X|           
                       X|                                                    X| 
          X|                     X|          X|                               
X|  # noqa
    # 
+----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
  # noqa
{code}


> Internally document type conversion between Pandas data and SQL types in 
> Pandas UDFs
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-25798
>                 URL: https://issues.apache.org/jira/browse/SPARK-25798
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> Currently, UDF's type coercion is not cleanly defined. See also 
> https://github.com/apache/spark/pull/20163 and 
> https://github.com/apache/spark/pull/22610
> This JIRA targets to describe the type conversion logic internally. For 
> instance:
> {code}
>     # 
> +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
>   # noqa
>     # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)|            
> 1(int32)|            
> 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 
> 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, 
> US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 
> 00:00:00(timedelta64[ns])|  # noqa
>     # 
> +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
>   # noqa
>     # |               boolean|      True|   True|    True|                
> True|                True|    True|     True|     True|     True|        X|   
>                            False|                                             
>    False|       False|                     X|          X|                     
>       False|  # noqa
>     # |               tinyint|         1|      1|       1|                   
> 1|                   1|       X|        X|        X|        X|        X|      
>                             X|                                                
>     X|           1|                     X|          0|                        
>        X|  # noqa
>     # |              smallint|         1|      1|       1|                   
> 1|                   1|       1|        X|        X|        X|        X|      
>                             X|                                                
>     X|           1|                     X|          X|                        
>        X|  # noqa
>     # |                   int|         1|      1|       1|                   
> 1|                   1|       1|        1|        X|        X|        X|      
>                             X|                                                
>     X|           1|                     X|          X|                        
>        X|  # noqa
>     # |                bigint|         1|      1|       1|                   
> 1|                   1|       1|        1|        1|        X|        X|      
>                             0|                                       
> 18000000000000|           1|                     X|          X|               
>                 X|  # noqa
>     # |                string|       u''|u'\x01'| u'\x01'|             
> u'\x01'|             u'\x01'| u'\x01'|  u'\x01'|  u'\x01'|  u'\x01'|     
> u'a'|                                  X|                                     
>                X|         u''|                     X|          X|             
>                   X|  # noqa
>     # |                  date|         X|      X|       
> X|datetime.date(197...|                   X|       X|        X|        X|     
>    X|        X|               datetime.date(197...|                           
>                          X|           X|                     X|          X|   
>                             X|  # noqa
>     # |             timestamp|         X|      X|       X|                   
> X|datetime.datetime...|       X|        X|        X|        X|        X|      
>          datetime.datetime...|                                 
> datetime.datetime...|           X|                     X|          X|         
>                       X|  # noqa
>     # |                 float|       1.0|    1.0|     1.0|                 
> 1.0|                 1.0|     1.0|      1.0|      1.0|      1.0|        X|    
>                               X|                                              
>       X|         1.0|                     X|          X|                      
>          X|  # noqa
>     # |                double|       1.0|    1.0|     1.0|                 
> 1.0|                 1.0|     1.0|      1.0|      1.0|      1.0|        X|    
>                               X|                                              
>       X|         1.0|                     X|          X|                      
>          X|  # noqa
>     # |            array<int>|         X|      X|       X|                   
> X|                   X|       X|        X|        X|        X|        X|      
>                             X|                                                
>     X|           X|             [1, 2, 3]|          X|                        
>        X|  # noqa
>     # |                binary|         X|      X|       X|                   
> X|                   X|       X|        X|        X|        X|        X|      
>                             X|                                                
>     X|           X|                     X|          X|                        
>        X|  # noqa
>     # |         decimal(10,0)|         X|      X|       X|                   
> X|                   X|       X|        X|        X|        X|        X|      
>                             X|                                                
>     X|           X|                     X|          X|                        
>        X|  # noqa
>     # |       map<string,int>|         X|      X|       X|                   
> X|                   X|       X|        X|        X|        X|        X|      
>                             X|                                                
>     X|           X|                     X|          X|                        
>        X|  # noqa
>     # |        struct<_1:int>|         X|      X|       X|                   
> X|                   X|       X|        X|        X|        X|        X|      
>                             X|                                                
>     X|           X|                     X|          X|                        
>        X|  # noqa
>     # 
> +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+
>   # noqa
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to