Davies Liu created PARQUET-377:
----------------------------------

             Summary: The file written in version 2 can't be read back
                 Key: PARQUET-377
                 URL: https://issues.apache.org/jira/browse/PARQUET-377
             Project: Parquet
          Issue Type: Bug
    Affects Versions: 1.7.0
            Reporter: Davies Liu


I tried to save a TPC-DS table store_sales as version 2, but it can't been read 
back:

{code}
org.apache.parquet.io.ParquetDecodingException: Can not read value at 852499 in 
block 0 in file 
file:/opt/store_sales/part-r-00002-f0497de9-0bf7-4cb2-98d0-b1dcf5a71ca8.gz.parquet
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
        at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
        at 
org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:168)
        at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1551)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
        at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1843)
        at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1843)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.parquet.io.ParquetDecodingException: Encoding 
DELTA_BYTE_ARRAY is only supported for type BINARY
        at 
org.apache.parquet.column.Encoding$7.getValuesReader(Encoding.java:196)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.initDataReader(ColumnReaderImpl.java:537)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.readPageV2(ColumnReaderImpl.java:577)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.access$400(ColumnReaderImpl.java:57)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:521)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:513)
        at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:141)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:513)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:505)
        at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:607)
        at 
org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:407)
        at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
        ... 13 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to