GPSnoopy edited a comment on pull request #9631: URL: https://github.com/apache/arrow/pull/9631#issuecomment-831553755
Hi @ggershinsky, I'm one of the end-users pestering @itamarst. ;-) I don't claim to be an encryption expert, so the following feedback is purely from a user/developer perspective. - We already have an in-house crypto library which handles the security choices, design and the integration with our KMS. - The integration of this library with Apache Arrow/Parquet (via ParquetSharp) is about 10-20 lines of code. - This crypto library generates the AES key, encrypts it using asymmetric keys (obtained via the KMS, driven by an company-internal user provided key identifier), adds some extra necessary header information and publishes that to Parquet as the key identifier. - It also deals with user authentication and key permissions. - This means that the way we manage Parquet encryption inside the company is consistent with the rest of the company; approved by the various security teams. - Being compatible with other external tools and a de-facto Parquet encryption high-level standard is nice, but ultimately the company cares about its own sensitive IP. So being compatible with the company ecosystem is higher priority than being compatible with Spark (ultimately we will never share encrypted files with other companies, kind of the main point). - The low-level API is internally used by us in both C++ and C#. So why is Python different? - I'm not sure I understand or appreciate the reluctance to provide both the low-level and higher-level API. It's a really nice property of a library to expose various level of abstraction, such that the user can integrate with the library at the required level. Having both APIs means that you provide the correct default behaviour and compatibility with Spark ecosystem for your users, and also **provide the necessary flexibility for users with use-cases you have not anticipated or foreseen**. IMHO the last point should be carefully considered, as it's reflected and used in highly acclaimed libraries and APIs, such as C++ STL, Boost, Zlib, OpenSSL, Vulkan, etc (personal bias in this choice of libraries, of course; interestingly DirectX12/Vulkan do prove a point though - developers want more fine grained access and level of controls in their API, not less). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
