Hi Itamar, I implemented some python wrappers for the low level API and would be happy to collaborate on that. The reason I didn't push this forward yet is what Gidon mentioned. The API to expose to python users needs to be finalized first and it must include the key tools API for interop with Spark.
Perhaps it would be good to kickoff a discussion on how the pyarrow API for PME should look like (in parallel to reviewing the arrow-cpp implementation of key-tools; to ensure that wrapping it would be a reasonable effort). One possible approach is to expose both the low level API and keytools separately. A user would create and initialize a PropertiesDrivenCryptoFactory and use it to create the FileEncryptionProperties/FileDecryptionProperties to pass to the lower level API. In pandas this would translate to something like: ``` factory = PropertiesDrivenCryptoFactory(...) df.to_parquet(path, engine="pyarrow", encryption=factory.getFileEncryptionProperties(...)) df = pd.read_parquet(path, engine="pyarrow", decryption=factory.getFileDecryptionProperties(...)) ``` This should also work with reading datasets since decryption uses a KeyRetriever, but I'm not sure what will need to be done once datasets will support write. On 2020/09/03 14:11:51, "Itamar Turner-Trauring" <ita...@pythonspeed.com> wrote: > Hi, > > I'm looking into implementing this, and it seems like there are two parts: > packaging, but also wrapping the APIs in Python. Is the latter item accurate? > If so, any examples of similar existing wrapped APIs, or should I just come > up with something on my own? > > Context: > https://github.com/apache/arrow/pull/4826 > https://issues.apache.org/jira/browse/ARROW-8040 > > Thanks, > > —Itamar