[
https://issues.apache.org/jira/browse/PARQUET-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17111206#comment-17111206
]
ASF GitHub Bot commented on PARQUET-1229:
-----------------------------------------
gszadovszky commented on a change in pull request #776:
URL: https://github.com/apache/parquet-mr/pull/776#discussion_r427284527
##########
File path:
parquet-column/src/main/java/org/apache/parquet/internal/column/columnindex/OffsetIndex.java
##########
@@ -49,6 +49,13 @@
* @return the index of the first row in the page
*/
public long getFirstRowIndex(int pageIndex);
+
+ /**
+ * @param pageIndex
+ * the index of the page
+ * @return the original ordinal of the page in the column chunk
+ */
+ public short getPageOrdinal(int pageIndex);
Review comment:
Why do we need it as a `short` instead of keeping it as an `int`? As per
the parquet.thrift spec we never say that we cannot have more pages than
`32767` even if it is unlikely to have such many.
##########
File path: parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesCipher.java
##########
@@ -68,19 +67,32 @@
public static byte[] createModuleAAD(byte[] fileAAD, ModuleType moduleType,
short rowGroupOrdinal, short columnOrdinal, short pageOrdinal) {
Review comment:
Theoretically we don't give hard limits for the number of row groups,
number of columns or the number of pages in the spec. There is a de facto limit
that we use thrift lists where the size is an i32 meaning that we should allow
java int values here.
Also, there was a post commit discussion in a [related
PR](https://github.com/apache/parquet-format/pull/142#issuecomment-600294754).
It is unfortunate that that time parquet-format was already released so I don't
know if there is a way to properly fix this issue in the format. Anyway, I
would not restrict these values to a `short`.
##########
File path:
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java
##########
@@ -98,16 +104,21 @@
((lengthBuffer[0] & 0xff));
if (ciphertextLength < 1) {
- throw new IOException("Wrong length of encrypted metadata: " +
ciphertextLength);
+ throw new ParquetCryptoRuntimeException("Wrong length of encrypted
metadata: " + ciphertextLength);
}
byte[] ciphertextBuffer = new byte[ciphertextLength];
gotBytes = 0;
// Read the encrypted structure contents
while (gotBytes < ciphertextLength) {
- int n = from.read(ciphertextBuffer, gotBytes, ciphertextLength -
gotBytes);
+ int n;
+ try {
+ n = from.read(ciphertextBuffer, gotBytes, ciphertextLength - gotBytes);
+ } catch (IOException e) {
Review comment:
We should let the `IOException` thrown out.
##########
File path:
parquet-hadoop/src/main/java/org/apache/parquet/crypto/AesGcmDecryptor.java
##########
@@ -70,23 +69,30 @@
if (null != AAD) cipher.updateAAD(AAD);
cipher.doFinal(ciphertext, inputOffset, inputLength, plainText,
outputOffset);
- } catch (GeneralSecurityException e) {
- throw new IOException("Failed to decrypt", e);
+ } catch (AEADBadTagException e) {
+ throw new TagVerificationException("GCM tag check failed", e);
+ } catch (GeneralSecurityException e) {
+ throw new ParquetCryptoRuntimeException("Failed to decrypt", e);
}
return plainText;
}
@Override
- public byte[] decrypt(InputStream from, byte[] AAD) throws IOException {
+ public byte[] decrypt(InputStream from, byte[] AAD) {
byte[] lengthBuffer = new byte[SIZE_LENGTH];
int gotBytes = 0;
// Read the length of encrypted Thrift structure
while (gotBytes < SIZE_LENGTH) {
- int n = from.read(lengthBuffer, gotBytes, SIZE_LENGTH - gotBytes);
+ int n;
+ try {
+ n = from.read(lengthBuffer, gotBytes, SIZE_LENGTH - gotBytes);
+ } catch (IOException e) {
Review comment:
We should let the `IOException` thrown out.
##########
File path:
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##########
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
return offset;
}
+ private static void verifyFooterIntegrity(InputStream from,
InternalFileDecryptor fileDecryptor,
+ int combinedFooterLength) throws IOException {
+
+ byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+ from.read(nonce);
+ byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+ from.read(gcmTag);
+
+ AesGcmEncryptor footerSigner =
fileDecryptor.createSignedFooterEncryptor();
+
+ byte[] footerAndSignature = ((ByteBufferInputStream)
from).slice(0).array();
+ int footerSignatureLength = AesCipher.NONCE_LENGTH +
AesCipher.GCM_TAG_LENGTH;
+ byte[] serializedFooter = new byte[combinedFooterLength -
footerSignatureLength];
+ System.arraycopy(footerAndSignature, 0, serializedFooter, 0,
serializedFooter.length);
+
+ byte[] signedFooterAAD =
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+ byte[] encryptedFooterBytes = footerSigner.encrypt(false,
serializedFooter, nonce, signedFooterAAD);
+ byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+ System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length -
AesCipher.GCM_TAG_LENGTH,
+ calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+ if (!Arrays.equals(gcmTag, calculatedTag)) {
+ throw new TagVerificationException("Signature mismatch in plaintext
footer");
+ }
+ }
+
public ParquetMetadata readParquetMetadata(final InputStream from,
MetadataFilter filter) throws IOException {
+ return readParquetMetadata(from, filter, null, false, 0);
+ }
+
+ public ParquetMetadata readParquetMetadata(final InputStream from,
MetadataFilter filter,
+ final InternalFileDecryptor fileDecryptor, final boolean
encryptedFooter,
+ final int combinedFooterLength) throws IOException {
+
+ final BlockCipher.Decryptor footerDecryptor = (encryptedFooter?
fileDecryptor.fetchFooterDecryptor() : null);
+ final byte[] encryptedFooterAAD = (encryptedFooter?
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
FileMetaData fileMetaData = filter.accept(new
MetadataFilterVisitor<FileMetaData, IOException>() {
@Override
public FileMetaData visit(NoFilter filter) throws IOException {
- return readFileMetaData(from);
+ return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
}
@Override
public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
- return readFileMetaData(from, true);
+ return readFileMetaData(from, true, footerDecryptor,
encryptedFooterAAD);
}
@Override
public FileMetaData visit(OffsetMetadataFilter filter) throws
IOException {
- return filterFileMetaDataByStart(readFileMetaData(from), filter);
+ return filterFileMetaDataByStart(readFileMetaData(from,
footerDecryptor, encryptedFooterAAD), filter);
}
@Override
public FileMetaData visit(RangeMetadataFilter filter) throws IOException
{
- return filterFileMetaDataByMidpoint(readFileMetaData(from), filter);
+ return filterFileMetaDataByMidpoint(readFileMetaData(from,
footerDecryptor, encryptedFooterAAD), filter);
}
});
LOG.debug("{}", fileMetaData);
- ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData);
+
+ if (!encryptedFooter && null != fileDecryptor) {
+ if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file
Review comment:
Is `plaintext` a usual term for _un-encrypted_ files? I don't really
like it but fine if it is commonly used in that sense. (Plaintext files for me
are the `*.txt` files.)
##########
File path:
parquet-format-structures/src/main/java/org/apache/parquet/format/BlockCipher.java
##########
@@ -51,19 +49,17 @@
* Parquet Modular Encryption specification.
* @param AAD - Additional Authenticated Data for the decryption (ignored
in case of CTR cipher)
* @return plaintext - starts at offset 0 of the output value, and fills
up the entire byte array.
- * @throws IOException thrown upon any crypto problem encountered during
decryption
*/
- public byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD) throws
IOException;
+ public byte[] decrypt(byte[] lengthAndCiphertext, byte[] AAD);
/**
* Convenience decryption method that reads the length and ciphertext from
the input stream.
*
* @param from Input stream with length and ciphertext.
* @param AAD - Additional Authenticated Data for the decryption (ignored
in case of CTR cipher)
* @return plaintext - starts at offset 0 of the output, and fills up the
entire byte array.
- * @throws IOException thrown upon any crypto or IO problem encountered
during decryption
*/
- public byte[] decrypt(InputStream from, byte[] AAD) throws IOException;
+ public byte[] decrypt(InputStream from, byte[] AAD);
Review comment:
I would keep `throws IOException` here. `InputStream` objects throw
`IOException` so the caller shall be prepared handling these.
##########
File path:
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##########
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
return offset;
}
+ private static void verifyFooterIntegrity(InputStream from,
InternalFileDecryptor fileDecryptor,
+ int combinedFooterLength) throws IOException {
+
+ byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+ from.read(nonce);
+ byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+ from.read(gcmTag);
+
+ AesGcmEncryptor footerSigner =
fileDecryptor.createSignedFooterEncryptor();
+
+ byte[] footerAndSignature = ((ByteBufferInputStream)
from).slice(0).array();
+ int footerSignatureLength = AesCipher.NONCE_LENGTH +
AesCipher.GCM_TAG_LENGTH;
+ byte[] serializedFooter = new byte[combinedFooterLength -
footerSignatureLength];
+ System.arraycopy(footerAndSignature, 0, serializedFooter, 0,
serializedFooter.length);
+
+ byte[] signedFooterAAD =
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+ byte[] encryptedFooterBytes = footerSigner.encrypt(false,
serializedFooter, nonce, signedFooterAAD);
+ byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+ System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length -
AesCipher.GCM_TAG_LENGTH,
+ calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+ if (!Arrays.equals(gcmTag, calculatedTag)) {
+ throw new TagVerificationException("Signature mismatch in plaintext
footer");
+ }
+ }
+
public ParquetMetadata readParquetMetadata(final InputStream from,
MetadataFilter filter) throws IOException {
+ return readParquetMetadata(from, filter, null, false, 0);
+ }
+
+ public ParquetMetadata readParquetMetadata(final InputStream from,
MetadataFilter filter,
+ final InternalFileDecryptor fileDecryptor, final boolean
encryptedFooter,
+ final int combinedFooterLength) throws IOException {
+
+ final BlockCipher.Decryptor footerDecryptor = (encryptedFooter?
fileDecryptor.fetchFooterDecryptor() : null);
+ final byte[] encryptedFooterAAD = (encryptedFooter?
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
FileMetaData fileMetaData = filter.accept(new
MetadataFilterVisitor<FileMetaData, IOException>() {
@Override
public FileMetaData visit(NoFilter filter) throws IOException {
- return readFileMetaData(from);
+ return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
}
@Override
public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
- return readFileMetaData(from, true);
+ return readFileMetaData(from, true, footerDecryptor,
encryptedFooterAAD);
}
@Override
public FileMetaData visit(OffsetMetadataFilter filter) throws
IOException {
- return filterFileMetaDataByStart(readFileMetaData(from), filter);
+ return filterFileMetaDataByStart(readFileMetaData(from,
footerDecryptor, encryptedFooterAAD), filter);
}
@Override
public FileMetaData visit(RangeMetadataFilter filter) throws IOException
{
- return filterFileMetaDataByMidpoint(readFileMetaData(from), filter);
+ return filterFileMetaDataByMidpoint(readFileMetaData(from,
footerDecryptor, encryptedFooterAAD), filter);
}
});
LOG.debug("{}", fileMetaData);
- ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData);
+
+ if (!encryptedFooter && null != fileDecryptor) {
+ if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file
+ fileDecryptor.setPlaintextFile();
+ // Done to detect files that were not encrypted by mistake
+ if (!fileDecryptor.plaintextFilesAllowed()) {
+ throw new IOException("Applying decryptor on plaintext file");
Review comment:
I think, `ParquetCryptoRuntimeException` would fit better here.
##########
File path:
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
##########
@@ -1185,70 +1275,189 @@ static long getOffset(ColumnChunk columnChunk) {
return offset;
}
+ private static void verifyFooterIntegrity(InputStream from,
InternalFileDecryptor fileDecryptor,
+ int combinedFooterLength) throws IOException {
+
+ byte[] nonce = new byte[AesCipher.NONCE_LENGTH];
+ from.read(nonce);
+ byte[] gcmTag = new byte[AesCipher.GCM_TAG_LENGTH];
+ from.read(gcmTag);
+
+ AesGcmEncryptor footerSigner =
fileDecryptor.createSignedFooterEncryptor();
+
+ byte[] footerAndSignature = ((ByteBufferInputStream)
from).slice(0).array();
+ int footerSignatureLength = AesCipher.NONCE_LENGTH +
AesCipher.GCM_TAG_LENGTH;
+ byte[] serializedFooter = new byte[combinedFooterLength -
footerSignatureLength];
+ System.arraycopy(footerAndSignature, 0, serializedFooter, 0,
serializedFooter.length);
+
+ byte[] signedFooterAAD =
AesCipher.createFooterAAD(fileDecryptor.getFileAAD());
+ byte[] encryptedFooterBytes = footerSigner.encrypt(false,
serializedFooter, nonce, signedFooterAAD);
+ byte[] calculatedTag = new byte[AesCipher.GCM_TAG_LENGTH];
+ System.arraycopy(encryptedFooterBytes, encryptedFooterBytes.length -
AesCipher.GCM_TAG_LENGTH,
+ calculatedTag, 0, AesCipher.GCM_TAG_LENGTH);
+ if (!Arrays.equals(gcmTag, calculatedTag)) {
+ throw new TagVerificationException("Signature mismatch in plaintext
footer");
+ }
+ }
+
public ParquetMetadata readParquetMetadata(final InputStream from,
MetadataFilter filter) throws IOException {
+ return readParquetMetadata(from, filter, null, false, 0);
+ }
+
+ public ParquetMetadata readParquetMetadata(final InputStream from,
MetadataFilter filter,
+ final InternalFileDecryptor fileDecryptor, final boolean
encryptedFooter,
+ final int combinedFooterLength) throws IOException {
+
+ final BlockCipher.Decryptor footerDecryptor = (encryptedFooter?
fileDecryptor.fetchFooterDecryptor() : null);
+ final byte[] encryptedFooterAAD = (encryptedFooter?
AesCipher.createFooterAAD(fileDecryptor.getFileAAD()) : null);
+
FileMetaData fileMetaData = filter.accept(new
MetadataFilterVisitor<FileMetaData, IOException>() {
@Override
public FileMetaData visit(NoFilter filter) throws IOException {
- return readFileMetaData(from);
+ return readFileMetaData(from, footerDecryptor, encryptedFooterAAD);
}
@Override
public FileMetaData visit(SkipMetadataFilter filter) throws IOException {
- return readFileMetaData(from, true);
+ return readFileMetaData(from, true, footerDecryptor,
encryptedFooterAAD);
}
@Override
public FileMetaData visit(OffsetMetadataFilter filter) throws
IOException {
- return filterFileMetaDataByStart(readFileMetaData(from), filter);
+ return filterFileMetaDataByStart(readFileMetaData(from,
footerDecryptor, encryptedFooterAAD), filter);
}
@Override
public FileMetaData visit(RangeMetadataFilter filter) throws IOException
{
- return filterFileMetaDataByMidpoint(readFileMetaData(from), filter);
+ return filterFileMetaDataByMidpoint(readFileMetaData(from,
footerDecryptor, encryptedFooterAAD), filter);
}
});
LOG.debug("{}", fileMetaData);
- ParquetMetadata parquetMetadata = fromParquetMetadata(fileMetaData);
+
+ if (!encryptedFooter && null != fileDecryptor) {
+ if (!fileMetaData.isSetEncryption_algorithm()) { // Plaintext file
Review comment:
Actually, based on parquet-mr README:
> Generally speaking, stick to the Sun Java Code Conventions
Based on the [related
section](https://www.oracle.com/java/technologies/javase/codeconventions-comments.html#286)
both should be fine.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> parquet-mr code changes for encryption support
> ----------------------------------------------
>
> Key: PARQUET-1229
> URL: https://issues.apache.org/jira/browse/PARQUET-1229
> Project: Parquet
> Issue Type: Sub-task
> Components: parquet-mr
> Reporter: Gidon Gershinsky
> Assignee: Gidon Gershinsky
> Priority: Major
> Labels: pull-request-available
>
> Addition of encryption/decryption support to the existing Parquet classes and
> APIs
--
This message was sent by Atlassian Jira
(v8.3.4#803005)