shangxinli commented on a change in pull request #896: URL: https://github.com/apache/parquet-mr/pull/896#discussion_r618552888
########## File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/Offsets.java ########## @@ -68,12 +68,14 @@ public static Offsets getOffsets(SeekableInputStream input, ColumnChunkMetaData return new Offsets(firstDataPageOffset, dictionaryPageOffset); } - private static long readDictionaryPageSize(SeekableInputStream in, long pos) throws IOException { + private static long readDictionaryPageSize(SeekableInputStream in, ColumnChunkMetaData chunk) throws IOException { long origPos = -1; try { origPos = in.getPos(); + in.seek(chunk.getStartingPos()); Review comment: I know it is true today, but what if that assumption is broken when more and more page types are added. Can we add something in [Encoding docs](https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8) to not let people change that assumption? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org