gszadovszky commented on a change in pull request #896:
URL: https://github.com/apache/parquet-mr/pull/896#discussion_r618536203



##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/Offsets.java
##########
@@ -68,12 +68,14 @@ public static Offsets getOffsets(SeekableInputStream input, 
ColumnChunkMetaData
     return new Offsets(firstDataPageOffset, dictionaryPageOffset);
   }
 
-  private static long readDictionaryPageSize(SeekableInputStream in, long pos) 
throws IOException {
+  private static long readDictionaryPageSize(SeekableInputStream in, 
ColumnChunkMetaData chunk) throws IOException {
     long origPos = -1;
     try {
       origPos = in.getPos();
+      in.seek(chunk.getStartingPos());

Review comment:
       It is not obvious that one have to search this statements in the 
[Encoding 
docs](https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8)
 but it is there:
   > The dictionary page is written first, before the data pages of the column 
chunk.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to