cxzl25 opened a new pull request, #1910: URL: https://github.com/apache/orc/pull/1910
### What changes were proposed in this pull request? This PR aims to write parquet decimal type data in Benchmark using `FIXED_LEN_BYTE_ARRAY` type. ### Why are the changes needed? Because the decimal type of the parquet file generated now corresponds to the binary type of parquet, but Spark3.5.1 does not support reading. Spark 3.5.1 supports reading if using the `FIXED_LEN_BYTE_ARRAY` type. main ``` optional binary fare_amount (DECIMAL(8,2)); ``` PR ``` optional fixed_len_byte_array(5) fare_amount (DECIMAL(10,2)); ``` ```bash java -jar spark/target/orc-benchmarks-spark-2.1.0-SNAPSHOT.jar spark data -format=parquet -compress zstd -data taxi ``` ```java org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException: column: [fare_amount], physicalType: BINARY, logicalType: decimal(8,2) at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.constructConvertNotSupportedException(ParquetVectorUpdaterFactory.java:1136) at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.getUpdater(ParquetVectorUpdaterFactory.java:199) at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:175) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:342) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at org.apache.orc.bench.spark.SparkBenchmark.processReader(SparkBenchmark.java:170) at org.apache.orc.bench.spark.SparkBenchmark.fullRead(SparkBenchmark.java:216) at org.apache.orc.bench.spark.jmh_generated.SparkBenchmark_fullRead_jmhTest.fullRead_avgt_jmhStub(SparkBenchmark_fullRead_jmhTest.java:219) ``` ### How was this patch tested? local test ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@orc.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org