Hello.

I am testing parquet modular encryption on parquet from spark 3.3 with parquet 1.12.2

```
spark.sparkContext.hadoopConfiguration.set("parquet.crypto.factory.class", "org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory")
spark.sparkContext.hadoopConfiguration.set("parquet.encryption.kms.client.class" , "org.apache.parquet.crypto.keytools.mocks.InMemoryKMS")
spark.sparkContext.hadoopConfiguration.set("parquet.encryption.key.list", "k1:AAECAwQFBgcICQoLDA0ODw==")
spark.sparkContext.hadoopConfiguration.set("parquet.encryption.plaintext.footer", "true")
spark.sparkContext.hadoopConfiguration.set("parquet.encryption.footer.key", "k1")
spark.sparkContext.hadoopConfiguration.set("parquet.encryption.column.keys", "k1:rider.bar")
val df = spark.sql("select uuid() as uuid, named_struct('foo', 2, 'bar', 3) as rider")
df.write.format("parquet").mode("overwrite").save("/tmp/enc")
```

now, from a new spark session, I expect to be able to read rider.foo since it is not encrypted. But apparently foo is not accessible because bar is encrypted. I suspect something makes nested fields encryption not readible (not spark related). Also this issue does not raise when working on flatten fields, only nested.

```
spark.read.format("parquet").load("/tmp/enc").selectExpr("rider.foo").show

Caused by: org.apache.parquet.crypto.ParquetCryptoRuntimeException: [rider, bar]. Null File Decryptor
  at org.apache.parquet.hadoop.metadata.EncryptedColumnChunkMetaData.decryptIfNeeded(ColumnChunkMetaData.java:602)
  at org.apache.parquet.hadoop.metadata.ColumnChunkMetaData.getEncodings(ColumnChunkMetaData.java:348)
  at org.apache.parquet.hadoop.ParquetRecordReader.checkDeltaByteArrayProblem(ParquetRecordReader.java:191)
  at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:177)
  at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
```

Thanks

Reply via email to