[ https://issues.apache.org/jira/browse/SPARK-21465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiang Gao updated SPARK-21465: ------------------------------ Description: For now, the behavior of different types of {{array.array}} support in pyspark is not clearly defined. As a result, in python 3, trying to create {{DataFrame}} of {{array('L')}} would give get an exception, while in python 2, the same code would not raise an exception but converting 'L' to a smaller integer instead. This behavior in python 2 might lead to overflow error if the input data is large enough. To avoid this unexpected behavior, we should throw an exception in python 2 for {{array('L')}} telling the user it is not supported, or support it using larger data types in JVM like BigInt. See discussions starting from https://github.com/apache/spark/pull/18444#discussion_r128132584 was: For now, the behavior of different types of {{array.array}} support in pyspark is not clearly defined. As a result, in python 3, trying to create {{DataFrame}} of {{array('L')}} would give get an exception, while in python 2, the same code would not raise an exception but converting 'L' to a smaller integer instead. This behavior in python 2 might lead to overflow error if the input data is large enough. To avoid this unexpected behavior, we should throw an exception in python 2 for {{array('L')}} telling the user it is not supported, or support it using larger data types in JVM like BigInt. > array('L') support might lead to overflow error > ----------------------------------------------- > > Key: SPARK-21465 > URL: https://issues.apache.org/jira/browse/SPARK-21465 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.2.0 > Reporter: Xiang Gao > > For now, the behavior of different types of {{array.array}} support in > pyspark is not clearly defined. > As a result, in python 3, trying to create {{DataFrame}} of {{array('L')}} > would give get an exception, while in python 2, the same code would not raise > an exception but converting 'L' to a smaller integer instead. This behavior > in python 2 might lead to overflow error if the input data is large enough. > To avoid this unexpected behavior, we should throw an exception in python 2 > for {{array('L')}} telling the user it is not supported, or support it using > larger data types in JVM like BigInt. > See discussions starting from > https://github.com/apache/spark/pull/18444#discussion_r128132584 -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org