[ https://issues.apache.org/jira/browse/SPARK-12624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-12624: ------------------------------- Description: The following code snippet reproduces this issue: {code} from pyspark.sql.types import StructType, StructField, IntegerType, StringType from pyspark.sql.types import Row schema = StructType([StructField("a", IntegerType()), StructField("b", StringType())]) rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x)) df = sqlContext.createDataFrame(rdd, schema) df.show() {code} An unintuitive {{ArrayIndexOutOfBoundsException}} exception is thrown in this case: {code} ... Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36) ... {code} We should give a better error message here. was: See https://github.com/apache/spark/pull/10564 Basically that test case should pass without the above fix and just assume b is null. > When schema is specified, we should give better error message if actual row > length doesn't match > ------------------------------------------------------------------------------------------------ > > Key: SPARK-12624 > URL: https://issues.apache.org/jira/browse/SPARK-12624 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Reporter: Reynold Xin > Priority: Blocker > > The following code snippet reproduces this issue: > {code} > from pyspark.sql.types import StructType, StructField, IntegerType, StringType > from pyspark.sql.types import Row > schema = StructType([StructField("a", IntegerType()), StructField("b", > StringType())]) > rdd = sc.parallelize(range(10)).map(lambda x: Row(a=x)) > df = sqlContext.createDataFrame(rdd, schema) > df.show() > {code} > An unintuitive {{ArrayIndexOutOfBoundsException}} exception is thrown in this > case: > {code} > ... > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.genericGet(rows.scala:227) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getAs(rows.scala:35) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.isNullAt(rows.scala:36) > ... > {code} > We should give a better error message here. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org