[
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720143#comment-17720143
]
ASF GitHub Bot commented on PARQUET-2297:
-----------------------------------------
Fokko commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186655384
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##########
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit
split, Configuration con
}
}
- if (!reader.getRowGroups().isEmpty()) {
+ if (!reader.getRowGroups().isEmpty() &&
+ // Encrypted files (parquet-mr 1.12+) can't have the delta encoding
problem (resolved in parquet-mr 1.8)
Review Comment:
Could we add a test for this?
> Encrypted files should not be checked for delta encoding problem
> ----------------------------------------------------------------
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Affects Versions: 1.13.0
> Reporter: Gidon Gershinsky
> Assignee: Gidon Gershinsky
> Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246)
> was fixed in writers since parquet-mr-1.8. This fix also added a
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns
> and checks for this problem in older files.
> This now triggers an unrelated exception when reading encrypted files, in the
> following situation: trying to read an unencrypted column, without having
> keys for encrypted columns (see
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark,
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for
> encrypted files - because these files can be written only with
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)