[GitHub] [parquet-mr] shangxinli commented on a change in pull request #896: PARQUET-2027: Fix calculating directory offset for merge

GitBox Thu, 22 Apr 2021 09:22:16 -0700


shangxinli commented on a change in pull request #896:
URL: https://github.com/apache/parquet-mr/pull/896#discussion_r618552888




##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/hadoop/Offsets.java
##########
@@ -68,12 +68,14 @@ public static Offsets getOffsets(SeekableInputStream input, 
ColumnChunkMetaData
     return new Offsets(firstDataPageOffset, dictionaryPageOffset);
   }
 
-  private static long readDictionaryPageSize(SeekableInputStream in, long pos) 
throws IOException {
+  private static long readDictionaryPageSize(SeekableInputStream in, 
ColumnChunkMetaData chunk) throws IOException {
     long origPos = -1;
     try {
       origPos = in.getPos();
+      in.seek(chunk.getStartingPos());

Review comment:
       I know it is true today, but what if that assumption is broken when more 
and more page types are added. Can we add something in [Encoding 
docs](https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8)
 to not let people change that assumption? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [parquet-mr] shangxinli commented on a change in pull request #896: PARQUET-2027: Fix calculating directory offset for merge

Reply via email to