[ https://issues.apache.org/jira/browse/SPARK-35461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-35461: ---------------------------------- Affects Version/s: (was: 3.1.1) (was: 3.0.2) 3.2.0 > Error when reading dictionary-encoded Parquet int column when read schema is > bigint > ----------------------------------------------------------------------------------- > > Key: SPARK-35461 > URL: https://issues.apache.org/jira/browse/SPARK-35461 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Chao Sun > Priority: Major > > When reading a dictionary-encoded integer column from a Parquet file, and > users specify read schema to be bigint, Spark currently will fail with the > following exception: > {code} > java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:49) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetDictionary.decodeToLong(ParquetDictionary.java:50) > at > org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:364) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:344) > {code} > To reproduce: > {code} > val data = (0 to 10).flatMap(n => Seq.fill(10)(n)).map(i => (i, > i.toString)) > withParquetFile(data) { path => > val readSchema = StructType(Seq(StructField("_1", LongType))) > spark.read.schema(readSchema).parquet(path).first() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org