[ https://issues.apache.org/jira/browse/SPARK-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15147591#comment-15147591 ]
Davies Liu commented on SPARK-13323: ------------------------------------ HiveTypeCoercion is pretty complicated, we may don't want to duplicate that in Python. What's the problem right? Or just because of the TODO? > Type cast support in type inference during merging types. > --------------------------------------------------------- > > Key: SPARK-13323 > URL: https://issues.apache.org/jira/browse/SPARK-13323 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 2.0.0 > Reporter: Hyukjin Kwon > > As described in {{types.py}}, there is a todo {{TODO: type cast (such as int > -> long)}}. > Currently, PySpark infers types but does not try to find compatible types > when the given types are different during merging schemas. > I think this can be done by resembling > {{HiveTypeCoercion.findTightestCommonTypeOfTwo}} for numbers and when one of > both is compared to {{StingType}}, just convert them into string. > It looks the possible leaf data types are below: > {code} > # Mapping Python types to Spark SQL DataType > _type_mappings = { > type(None): NullType, > bool: BooleanType, > int: LongType, > float: DoubleType, > str: StringType, > bytearray: BinaryType, > decimal.Decimal: DecimalType, > datetime.date: DateType, > datetime.datetime: TimestampType, > datetime.time: TimestampType, > } > {code} > and they are converted pretty well to string as below: > {code} > >>> print str(None) > None > >>> print str(True) > True > >>> print str(float(0.1)) > 0.1 > >>> str(bytearray([255])) > '\xff' > >>> str(decimal.Decimal()) > '0' > >>> str(datetime.date(1,1,1)) > '0001-01-01' > >>> str(datetime.datetime(1,1,1)) > '0001-01-01 00:00:00' > >>> str(datetime.time(1,1,1)) > '01:01:01' > {code} > First, I tried to find the relevant issue with this but I couldn't. Please > mark this as a duplicate if there is already. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org