[
https://issues.apache.org/jira/browse/PARQUET-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876168#comment-14876168
]
Davies Liu commented on PARQUET-377:
------------------------------------
DeltaByteArrayWriter could be used for FIXED_LEN_BYTE_ARRAY, but the reader
didn't accept it, will send a PR soon.
> The file written in version 2 can't be read back
> ------------------------------------------------
>
> Key: PARQUET-377
> URL: https://issues.apache.org/jira/browse/PARQUET-377
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.7.0
> Reporter: Davies Liu
>
> I tried to save a TPC-DS table store_sales as version 2, but it can't been
> read back:
> {code}
> org.apache.parquet.io.ParquetDecodingException: Can not read value at 852499
> in block 0 in file
> file:/opt/store_sales/part-r-00002-f0497de9-0bf7-4cb2-98d0-b1dcf5a71ca8.gz.parquet
> at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:228)
> at
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:201)
> at
> org.apache.spark.rdd.SqlNewHadoopRDD$$anon$1.hasNext(SqlNewHadoopRDD.scala:168)
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1551)
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1843)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1843)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:88)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.parquet.io.ParquetDecodingException: Encoding
> DELTA_BYTE_ARRAY is only supported for type BINARY
> at
> org.apache.parquet.column.Encoding$7.getValuesReader(Encoding.java:196)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.initDataReader(ColumnReaderImpl.java:537)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.readPageV2(ColumnReaderImpl.java:577)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.access$400(ColumnReaderImpl.java:57)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:521)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:513)
> at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:141)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:513)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:505)
> at
> org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:607)
> at
> org.apache.parquet.io.RecordReaderImplementation.read(RecordReaderImplementation.java:407)
> at
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:209)
> ... 13 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)