[ 
https://issues.apache.org/jira/browse/SPARK-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147591#comment-15147591
 ] 

Davies Liu commented on SPARK-13323:
------------------------------------

HiveTypeCoercion is pretty complicated, we may don't want to duplicate that in 
Python.

What's the problem right? Or just because of the TODO?

> Type cast support in type inference during merging types.
> ---------------------------------------------------------
>
>                 Key: SPARK-13323
>                 URL: https://issues.apache.org/jira/browse/SPARK-13323
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>
> As described in {{types.py}}, there is a todo {{TODO: type cast (such as int 
> -> long)}}.
> Currently, PySpark infers types but does not try to find compatible types 
> when the given types are different during merging schemas.
> I think this can be done by resembling 
> {{HiveTypeCoercion.findTightestCommonTypeOfTwo}} for numbers and when one of 
> both is compared to {{StingType}}, just convert them into string.
> It looks the possible leaf data types are below:
> {code}
> # Mapping Python types to Spark SQL DataType
> _type_mappings = {
>     type(None): NullType,
>     bool: BooleanType,
>     int: LongType,
>     float: DoubleType,
>     str: StringType,
>     bytearray: BinaryType,
>     decimal.Decimal: DecimalType,
>     datetime.date: DateType,
>     datetime.datetime: TimestampType,
>     datetime.time: TimestampType,
> }
> {code}
> and they are converted pretty well to string as below:
> {code}
> >>> print str(None)
> None
> >>> print str(True)
> True
> >>> print str(float(0.1))
> 0.1
> >>> str(bytearray([255]))
> '\xff'
> >>> str(decimal.Decimal())
> '0'
> >>> str(datetime.date(1,1,1))
> '0001-01-01'
> >>> str(datetime.datetime(1,1,1))
> '0001-01-01 00:00:00'
> >>> str(datetime.time(1,1,1))
> '01:01:01'
> {code}
> First, I tried to find the relevant issue with this but I couldn't. Please 
> mark this as a duplicate if there is already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to