kbendick commented on a change in pull request #2675:
URL: https://github.com/apache/iceberg/pull/2675#discussion_r645363982



##########
File path: api/src/main/java/org/apache/iceberg/ManifestFile.java
##########
@@ -179,6 +181,13 @@ default boolean hasDeletedFiles() {
    */
   List<PartitionFieldSummary> partitions();
 
+  /**
+   * Returns metadata about how this manifest file is encrypted, or null if 
the file is stored in plain text.
+   */
+  default ByteBuffer keyMetadata() {
+    return null;
+  }

Review comment:
       Is there any way to avoid using `null` as the return argument for 
`default` methods?
   
   This particular case isn't too bad, as the `null` return value is documented 
and has semantic meaning. But I've noticed that we return `null` for a number 
of methods in base classes or default methods in interfaces, many times where 
it's expected that the child classes _will_ override the method, that could 
lead to bad results.
   
   So partially I wanted to bring that issue up as it seems to be a code style 
that is employed sometimes that is somewhat unsafe, and I've noticed it in a 
few PRs recently.
   
   [I recently fixed a 
bug](https://github.com/apache/iceberg/pull/2630/files#diff-1ae8e9490fe1a4b6da8842c1c313fa57e68a674e186bfeace1c763bee1381faaL72-L74)
 arising from a similar situation - in this case, 
`BaseMetastoreTableOperations#tableName`, which should have been `abstract` as 
all catalogs needed to implement it, that was subsequently not overridden by 
`NessieTableOperations` and would have lead to all of the issues one can 
imagine with unexpected nulls in a PR that was being merged at the time.
   

##########
File path: api/src/main/java/org/apache/iceberg/ManifestFile.java
##########
@@ -179,6 +181,13 @@ default boolean hasDeletedFiles() {
    */
   List<PartitionFieldSummary> partitions();
 
+  /**
+   * Returns metadata about how this manifest file is encrypted, or null if 
the file is stored in plain text.
+   */
+  default ByteBuffer keyMetadata() {
+    return null;
+  }

Review comment:
       Having looked through the PR further, I think the usage of `null` here 
is definitely justified. Especially as the utility methods in `ByteBuffers` 
handle null already so there's not too much extra null handling, and the 
physical type needs to be byte[] for Avro.
   
   Feel free to resolve this, but I've seen this in a number of PRs recently 
where the `null` return value was more just a stand in for `abstract` and I 
felt it would be good to draw attention to this in general as it seems to be 
creeping into the codebase here and there. However, this is not one of those 
cases. 🙂 

##########
File path: core/src/main/java/org/apache/iceberg/GenericManifestFile.java
##########
@@ -399,6 +415,7 @@ public String toString() {
         .add("deleted_data_files_count", deletedFilesCount)
         .add("deleted_rows_count", deletedRowsCount)
         .add("partitions", partitions)
+        .add("key_metadata", keyMetadata == null ? "null" : "(redacted)")

Review comment:
       Given we're adding a number of encryption related PRs, most of which 
have very sensitive data in them (encryption keys), would it make sense to make 
this into a utility function, such as `EncryptionUtils.toRedactedString(byte[] 
value)`?
   
   We could have a unified way of redacting, which could also reduce the null 
checks in the code.
   
   It would give people the option of possibly customizing the way in which 
they redact keys (say, in a fork or something or via an override) that redacts 
in the way that allows SREs to better assist customers - for example, I can 
imagine that there _might_ be utility in showing the first 2 or 4 bytes or 
something so as to be able to check that the key is definitively not the same.
   
   Not sure how much benefit people would have in custom redaction for 
stringifying, but throwing that out there as I'm curious to hear if there is 
any utility in not entirely redacting, from a usability / debugging standpoint.

##########
File path: core/src/main/java/org/apache/iceberg/GenericManifestFile.java
##########
@@ -399,6 +415,7 @@ public String toString() {
         .add("deleted_data_files_count", deletedFilesCount)
         .add("deleted_rows_count", deletedRowsCount)
         .add("partitions", partitions)
+        .add("key_metadata", keyMetadata == null ? "null" : "(redacted)")

Review comment:
       Given we're adding a number of encryption related PRs, most of which 
have very sensitive data in them (encryption keys), would it make sense to make 
this into a utility function, such as `EncryptionUtils.toRedactedString(byte[] 
value)`?
   
   We could have a unified way of redacting, which could also reduce the null 
checks in the code.
   
   It would give people the option of possibly customizing the way in which 
they redact keys (say, in a fork or something or via an override) that redacts 
in the way that allows SREs to better assist customers - for example, I can 
imagine that there _might_ be utility in showing the first 2 or 4 bytes or 
something so as to be able to check that the key is definitively not the same 
as another one (would defer to the experts on how insecure it is to allow 
logging any portion of the key or a hash of a portion at all).
   
   Not sure how much benefit people would have in custom redaction for 
stringifying (either for checking if the key metadata is definitively not the 
same or something else), but throwing that out there as I'm curious to hear if 
there is any utility in not entirely redacting, from a usability / debugging 
standpoint.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to