trying to project columns without authorization can be very costly, for two
reasons:
- unnecessary per-column/file calls to the (remote) KMS service, plus the
cost of per-call authorization checks
- red-flagging unauthorized calls and triggering "breach attempt" alerts

IMO, the best way to handle this is to have a layer on top of parquet -
that gets the list of authorized columns for the reader (eg from a policy
engine), and allows to project only them (returning nulls for the others)

Cheers, Gidon


On Thu, Oct 27, 2022 at 1:01 AM nicolas paris <[email protected]>
wrote:

> hello,
>
> as mentionned in several places [1], from a data analyst point of view,
> having null values for encrypted columns when one has no key to decrypt
> is better than getting exceptions, and ease the data exploration
> allowing select * instead of writing each allowed columns.
>
> I have been digging the crypto source code to find a easy way to catch
> crypto exception and turn values to null from the
> DecryptionPropertiesFactory that can be passed to the query engine
> thought hadoop configs.
>
> I might be missing something, but I haven't found a way to tell the
> ParquetReader to put nulls and go ahead reading un-encrypted columns
> when something get wrong with the KMS.
>
> Is such behavior available or are you willing to add such feature at
> parquet level in the future ?
>
> Thanks
>
>
> [1]
>
> https://www.uber.com/en-FR/blog/one-stone-three-birds-finer-grained-encryption-apache-parquet/
>

Reply via email to