smengcl commented on code in PR #4188:
URL: https://github.com/apache/ozone/pull/4188#discussion_r1550044937


##########
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/rpc/RpcClient.java:
##########
@@ -1195,7 +1196,8 @@ public OzoneOutputStream createKey(
         .setReplicationConfig(replicationConfig)
         .addAllMetadata(metadata)
         .setAcls(getAclList())
-        .setLatestVersionLocation(getLatestVersionLocation);
+        .setLatestVersionLocation(getLatestVersionLocation)
+        .setOwnerName(ownerName);

Review Comment:
   Actually I wonder if we really need to add this owner field to every single 
new key created in Ozone.
   
   Arguably we could just leave this as unset (empty) to indicate the default 
behavior, which is to inherit parent's owner. This way we don't bloat every 
single key with this extra owner string in its proto message.
   
   Cons of setting `owner` for every single new key:
   
   1. Slight perf hit for every new key/dir creation.
   
   2. This also implies `chown` on a bucket/dir would be recursive by default. 
Makes `chown` much slower. (Where in the case of not setting `owner` for every 
key, we don't need to recurse by default because the sub-keys and sub-dirs 
should just inherit the parent owner)
   
   3. Because `owner` is a proto string, it can be wasting quite a lot of space 
repeating the same name string. (numeric user ID mapping can solve that)
   
   4. When a cluster is upgraded to a new Ozone version with key ownership 
support, all existing keys would natually have empty owners as well. Unless we 
add a lengthy step during OM upgrade to add owner to all existing keys, which I 
don't think is the best idea).
   
   Pros:
   
   1. On the other hand I get that the the owner field in key info is required 
for "sticky bit" behavior to work properly with Ranger authorizer (previously 
this requires a workaround, HDDS-9701).
   2. Writer user is not necessarily the same as the parent owner.
   
   Discussions:
   
   1. Is there some other reasons we must set this owner field for new keys and 
dirs?
   3. How much perf hit would there actually be to add this owner field for 
every key? Try this in a micro benchmark with and without S3?
   4. Is it reasonable to add an config to allow not setting owner field for 
new keys and dirs by default? Or, only add owner field if the parent dir have 
the "sticky bit".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to