Davies Liu created PARQUET-377:
----------------------------------
Summary: The file written in version 2 can't be read back
Key: PARQUET-377
URL: https://issues.apache.org/jira/browse/PARQUET-377
Project: Parquet
Issue Type: Bug
Affects Versions: 1.7.0
Reporter: Davies Liu
I tried to save a TPC-DS table store_sales as version 2, but it can't been read
back:
{code}
org.apache.parquet.io.ParquetDecodingException: Can not read value at 852499 in
block 0 in file
file:/opt/store_sales/part-r-00002-f0497de9-0bf7-4cb2-98d0-b1dcf5a71ca8.gz.parquet
at
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
at
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
at
org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:168)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1551)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1843)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1843)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.parquet.io.ParquetDecodingException: Encoding
DELTA_BYTE_ARRAY is only supported for type BINARY
at
org.apache.parquet.column.Encoding$7.getValuesReader(Encoding.java:196)
at
org.apache.parquet.column.impl.ColumnReaderImpl.initDataReader(ColumnReaderImpl.java:537)
at
org.apache.parquet.column.impl.ColumnReaderImpl.readPageV2(ColumnReaderImpl.java:577)
at
org.apache.parquet.column.impl.ColumnReaderImpl.access$400(ColumnReaderImpl.java:57)
at
org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:521)
at
org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:513)
at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:141)
at
org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:513)
at
org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:505)
at
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:607)
at
org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:407)
at
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
... 13 more
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)