zeroshade commented on PR #596:
URL: https://github.com/apache/arrow-go/pull/596#issuecomment-3684227012

   Sorry for the delay here, I was on vacation all last week.
   
   I see what the issue going on here is. What is exposed via pyarrow is the 
high-level API for utilizing a KMS to manage the keys by wrapping them. It will 
generate random bytes to use for the key and then wrap the key using the 
`wrap_key` and `unwrap_key` functions in the KMS implementation. 
   
   If we look at your mock example:
   
   ```python
   class MockKmsClient(pe.KmsClient):
       def __init__(self, kms_connection_configuration):
           super().__init__()
   
       def wrap_key(self, key_bytes, master_key_identifier):
           return base64.b64encode(key_bytes)
   
       def unwrap_key(self, wrapped_key, master_key_identifier):
           return base64.b64decode(wrapped_key)
   ```
   
   And compare that with the unit tests from the Arrow repo for pyarrow parquet 
encryption, I was able to figure out what it is doing. Essentially, the 
`key_bytes` are the random bytes that actually got used to encrypt the data. 
Those bytes are the `key` that Go will need for decrypting. In your example 
with this MockKmsClient, you're storing the key bytes directly into the 
metadata, so the following should work on the Go side for decrypting:
   
   ```go
   type metadataKeyRetriever struct{}
   
   func (metadataKeyRetriever) GetKey(keyMetadata []byte) string {
       var keyMeta struct {
           WrappedKey string `json:"wrappedDEK"`
       }
   
       json.Unmarshal(keyMetadata, &keyMeta)
       byts, err := base64.StdEncoding.DecodeString(keyMeta.WrappedDEK)
       if err != nil {
           panic(err)
       }
   
       return string(byts)
   }
   ```
   
   Another option might be to manipulate the `key_bytes` with the master key 
bytes so that the key itself isn't stored directly in the metadata so easily 
(or use some external thing). I was able to get the above to work, so think of 
it like this:
   
   For Python -
   
   * `wrap_key` takes the Key ID (`master_key_identifier`) and the actual 
random bytes used as the encryption key (`key_bytes`) and outputs what goes 
into the key metadata. These `key_bytes` are what Go needs since we only expose 
the low-level API currently.
   * `unwrap_key` takes the result from `wrap_key` and the Key ID 
(`master_key_identifier`) and returns the actual key to use for decrypting. Go 
would need to output the properly formatted JSON blob that pyarrow is expecting 
as the key metadata so that the python `unwrap_key` would be able to somehow 
determine the actual `key_bytes` from the metadata.
   
   Does this help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to