ggershinsky commented on code in PR #13225:
URL: https://github.com/apache/iceberg/pull/13225#discussion_r2490399490
##########
core/src/main/java/org/apache/iceberg/rest/RESTTableOperations.java:
##########
@@ -166,7 +208,67 @@ public void commit(TableMetadata base, TableMetadata
metadata) {
@Override
public FileIO io() {
- return io;
+ if (encryptionKeyId == null) {
+ return io;
+ }
+
+ if (encryptingFileIO == null) {
+ encryptingFileIO = EncryptingFileIO.combine(io, encryption());
+ }
+
+ return encryptingFileIO;
+ }
+
+ @Override
+ public EncryptionManager encryption() {
+ if (encryptionManager != null) {
+ return encryptionManager;
+ }
+
+ if (encryptionKeyId != null) {
+ if (kmsClient == null) {
+ throw new RuntimeException(
+ "Cant create encryption manager, because key management client is
not set");
+ }
+
+ Map<String, String> tableProperties = Maps.newHashMap();
+ tableProperties.put(TableProperties.ENCRYPTION_TABLE_KEY,
encryptionKeyId);
+ tableProperties.put(
+ TableProperties.ENCRYPTION_DEK_LENGTH,
String.valueOf(encryptionDekLength));
+ encryptionManager =
+ EncryptionUtil.createEncryptionManager(
+ encryptedKeysFromMetadata, tableProperties, kmsClient);
+ } else {
+ return PlaintextEncryptionManager.instance();
+ }
+
+ return encryptionManager;
+ }
+
+ private void encryptionPropsFromMetadata(TableMetadata metadata) {
Review Comment:
> for the Hive case it's only the table properties that are stored in HMS,
right?
Correct. That's why the Hive catalog provides partial security (data
confidentiality only; not integrity). REST catalogs can provide full security
for encrypted tables.
> Is there a follow-up there or is this sufficient for the threat model?
Afaik, HMS is not optimized for JSON storage. But maybe someone in the
community will take on storing the full metadata object there, to improve table
security. Having only the table properties is barely sufficient. I think we
should recommend REST catalogs for encrypted tables.
> REST catalog checking those table properties are correct before returning
them
To check this, the REST catalog would need to keep these properties in an
independent storage / db? If so, maybe the catalog can just keep the whole
metadata object there? (as a json file in an independent storage or as an
object in a DB/store)
Ok, here's my current understanding of the advantages of each approach - all
corrections/additions are welcome.
Storing the encryption properties
- implementation: it might be easier to store two properties instead of a
metadata object (in certain databases, backing the catalog)
Storing the whole metadata object
- better security (table integrity is guaranteed)
- implementation: it might be easier to store the whole metadata object (in
object stores /nosql db's /regular file systems, backing the catalog)
- REST spec addition is concise, simple and easy to understand (see the
first pass at https://github.com/apache/iceberg/pull/14486). An implementation
is either "secure" or not. Going deep into threat model levels and encryption
property details will make the addition much bigger and more complex; might be
out of place for this spec.
(also, a simple security spec means a lower risk of wrong implementations /
security vulnerabilities)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]