zeroshade commented on PR #596:
URL: https://github.com/apache/arrow-go/pull/596#issuecomment-3684227012
Sorry for the delay here, I was on vacation all last week.
I see what the issue going on here is. What is exposed via pyarrow is the
high-level API for utilizing a KMS to manage the keys by wrapping them. It will
generate random bytes to use for the key and then wrap the key using the
`wrap_key` and `unwrap_key` functions in the KMS implementation.
If we look at your mock example:
```python
class MockKmsClient(pe.KmsClient):
def __init__(self, kms_connection_configuration):
super().__init__()
def wrap_key(self, key_bytes, master_key_identifier):
return base64.b64encode(key_bytes)
def unwrap_key(self, wrapped_key, master_key_identifier):
return base64.b64decode(wrapped_key)
```
And compare that with the unit tests from the Arrow repo for pyarrow parquet
encryption, I was able to figure out what it is doing. Essentially, the
`key_bytes` are the random bytes that actually got used to encrypt the data.
Those bytes are the `key` that Go will need for decrypting. In your example
with this MockKmsClient, you're storing the key bytes directly into the
metadata, so the following should work on the Go side for decrypting:
```go
type metadataKeyRetriever struct{}
func (metadataKeyRetriever) GetKey(keyMetadata []byte) string {
var keyMeta struct {
WrappedKey string `json:"wrappedDEK"`
}
json.Unmarshal(keyMetadata, &keyMeta)
byts, err := base64.StdEncoding.DecodeString(keyMeta.WrappedDEK)
if err != nil {
panic(err)
}
return string(byts)
}
```
Another option might be to manipulate the `key_bytes` with the master key
bytes so that the key itself isn't stored directly in the metadata so easily
(or use some external thing). I was able to get the above to work, so think of
it like this:
For Python -
* `wrap_key` takes the Key ID (`master_key_identifier`) and the actual
random bytes used as the encryption key (`key_bytes`) and outputs what goes
into the key metadata. These `key_bytes` are what Go needs since we only expose
the low-level API currently.
* `unwrap_key` takes the result from `wrap_key` and the Key ID
(`master_key_identifier`) and returns the actual key to use for decrypting. Go
would need to output the properly formatted JSON blob that pyarrow is expecting
as the key metadata so that the python `unwrap_key` would be able to somehow
determine the actual `key_bytes` from the metadata.
Does this help?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]