thanks again for your guidance, and work around this.
that makes sense

On Thu, 2022-10-27 at 10:45 +0300, Gidon Gershinsky wrote:
> trying to project columns without authorization can be very costly,
> for two
> reasons:
> - unnecessary per-column/file calls to the (remote) KMS service, plus
> the
> cost of per-call authorization checks
> - red-flagging unauthorized calls and triggering "breach attempt"
> alerts
> 
> IMO, the best way to handle this is to have a layer on top of parquet
> -
> that gets the list of authorized columns for the reader (eg from a
> policy
> engine), and allows to project only them (returning nulls for the
> others)
> 
> Cheers, Gidon
> 
> 
> On Thu, Oct 27, 2022 at 1:01 AM nicolas paris
> <[email protected]>
> wrote:
> 
> > hello,
> > 
> > as mentionned in several places [1], from a data analyst point of
> > view,
> > having null values for encrypted columns when one has no key to
> > decrypt
> > is better than getting exceptions, and ease the data exploration
> > allowing select * instead of writing each allowed columns.
> > 
> > I have been digging the crypto source code to find a easy way to
> > catch
> > crypto exception and turn values to null from the
> > DecryptionPropertiesFactory that can be passed to the query engine
> > thought hadoop configs.
> > 
> > I might be missing something, but I haven't found a way to tell the
> > ParquetReader to put nulls and go ahead reading un-encrypted
> > columns
> > when something get wrong with the KMS.
> > 
> > Is such behavior available or are you willing to add such feature
> > at
> > parquet level in the future ?
> > 
> > Thanks
> > 
> > 
> > [1]
> > 
> > https://www.uber.com/en-FR/blog/one-stone-three-birds-finer-grained-encryption-apache-parquet/
> > 

Reply via email to