[GitHub] [arrow] ggershinsky commented on pull request #9631: ARROW-11644: [Python][Parquet] Low-level Parquet decryption in Python

GitBox Mon, 03 May 2021 10:24:36 -0700


ggershinsky commented on pull request #9631:
URL: https://github.com/apache/arrow/pull/9631#issuecomment-831408837



   I know Itamar has put a lot of time into developing this capability for his 
customer (which I appreciate), and into contributing it to the open source, so 
I'd be glad to repeat the reasons for the concern, and to expand on them.
   IMO, the core of the problem at hand is that the low-level API looks 
deceivingly simple, and the high-level API seems more restrictive and less 
intuitive. I know who is to blame, because I've basically designed both of them 
:) But this is being built ground-up, from the spec, to the low-level; then, 
with time and field experience, to the high-level layer; so there is no way to 
hide the low-level now. Its take on encryption seems to be "just give the key, 
its id, and we're done". I know that a handful of top data encryption experts 
won't think like that, and will look for ways to handle the NIST limit on the 
number of GCM crypto operations (so the cipher is not broken), and for ways to 
perform key rotation and other standard data security procedures. But I'm 
pretty sure most of the end users of PyArrow and pandas will be using the 
low-level, if exposed, in the "intuitive" way.
   Another reason is compatibility with Apache Spark and other frameworks, that 
never exposed the low-level layer, and will start offering Parquet encryption 
via the high-level API. It would be really good if Spark will be able to read 
files produced by PyArrow, and vice-versa.
   It will take some time till a full high-level API implementation is 
available in Arrow. I understand and appreciate the pressure to have Parquet 
encryption available asap, but it is worth (at least in my view :) to wait a 
bit more for the safe and compatible high-level layer. The work on Parquet 
encryption has started in 2017; what's few more months..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] ggershinsky commented on pull request #9631: ARROW-11644: [Python][Parquet] Low-level Parquet decryption in Python

Reply via email to