[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

via GitHub Sun, 07 May 2023 03:56:05 -0700


ggershinsky commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186829851



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##########
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit 
split, Configuration con
       }
     }
 
-    if (!reader.getRowGroups().isEmpty()) {
+    if (!reader.getRowGroups().isEmpty() &&
+      // Encrypted files (parquet-mr 1.12+) can't have the delta encoding 
problem (resolved in parquet-mr 1.8)

Review Comment:
   - with delta encoding problem: basically impossible to reproduce :), it was 
resolved in 1.8
   - without this problem: I've had a look at the existing unitests, 
unfortunately none can be used as a basis for adding a function for this 
particular situation. This will require building a new unitest from scratch. 
However, given that a) the patch is small and straightforward b) Spark stopped 
using this parquet read path - building a full unitest can be an overkill. But 
if you have a different opinion, please let me know.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-mr] ggershinsky commented on a diff in pull request #1089: PARQUET-2297: Skip delta problem check for encrypted files

Reply via email to