[GitHub] [ozone] duongkame commented on pull request #4194: HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM

via GitHub Thu, 23 Feb 2023 22:56:46 -0800


duongkame commented on PR #4194:
URL: https://github.com/apache/ozone/pull/4194#issuecomment-1442884341


   > You're welcome! In the meantime one thought does bother me a bit, and 
would like to understand the intent for that...
   > 
   > In the SecretKeyManager and the SecretKeyState, I have some doubts around 
the mechanism that generates a new key and does the rotation... it is hard to 
tell what exactly, it is just a bad feeling so far, so let me think out loud 
here: If I understand correctly, SecretKeyManager will decide it should rotate, 
once the current key is older than 1 day. Once the rotation happens eventually 
after 1d, the SecretKeyManager#checkAndRotate() method creates a new list of 
secret keys, it filters the oldest expired key from the list, and 
SecretKeyState#updateKeys is called with the keys in a sorted order, where the 
newest key is the first item in the list. This call to updateKeys is replicated 
to the other SCMs, so all SCMs eventually will have the new set of keys.
   > 
   > So far this is clean on its own, the confusion on my side comes after 
this. The way of implementation suggests that the second proposal the pull 
model will be used on the DNs. How this will look like? I guess if the DN runs 
into a token, that is newer then the newest key it has, it will pull the list 
of the keys from SCM. While OMs will pull periodically from SCM. If my 
understanding is correct, then operations wise we are good, and probably my 
mind stuck with the push model and the pre-generated and distributed next key.
   > 
   > During the thought experiment around this, I started to feel that it might 
not be correct to have a SecretKeyState that gets its internal representation 
from outside, and also misses proper encapsulation... With the current code 
this might not be a problem, but here are the concerns:
   > 
   > 1. the internal list is guarded by locking, but its reference is leaking 
via the getSortedKeys() method. Though the actual implementation used is an 
unmodifiable list as of now.
   > 2. updateKeys can override the full internal state
   > 
   > The first one might not be a problem at all at the end of the day, it 
might be, if someone carelessly change the internal representation from 
unmodifiable list to a mutable list impementation. Second one on the other hand 
there might be a more interesting question... Let's do some thought experiment: 
Let's say we have one SCM (SCM1) that is stopped for an extended period of 
time, and its raft logs are so old that when it gets back in service it 
downloads a new snapshot, and since that snapshot there was no rotation 
happening. At this time SCM2 is the leader, but after SCM1 comes up and 
syncronizes its data and becomes a proper follower, SCM2 goes down before there 
was a rotation of secret keys, and during election SCM1 becomes the elected 
leader.
   > 
   > In this scenario will SCM1 have the secretKeys properly? As based on the 
code my assumption is that in this case SCM1 will have 1 secret key in its list 
generated at startup (let's assume every key that was persisted has expired 
already), with that it will issue tokens, and DNs will be able to validate 
those tokens, but once keys are rotated, the list of old keys will be 
overridden in all SCMs.
   > 
   > After this has happened, let's assume a DN goes down, and comes up due to 
a restart for any reason. In this case if there is a client that has a token 
issued by SCM2 back when it was the leader, was not be able to read data from 
the restarted DN, as that DN will not be able to fetch the formerly valid token 
from anywhere, while those DNs that have the token in memory will work fine 
further causing intermittent read failures.
   > 
   > Do I miss something, or this is a legit scenario with this approach of 
managing the secret keys? I see this happening because the keys are managed in 
a file, not in rocksDB, and so when SCM1 downloads the snapshot of the data to 
replay logs from that snapshot, it might not replay the transaction that 
updates the file, and the file is not in the snapshot, but putting the secret 
keys to rocksDB with that to the snapshot will hurt us later when it comes to 
conforming with FISMA for example.
   
   Thanks again for being thorough, @fapifta. 
   
   #1 is just a design choice. I was considering between a) creating a copy 
list to serve a query, or b) leveraging immutability. I think in Ozone we're so 
easy to go with a) and that's something I notice when doing a memory scaling 
analysis in SCM. Today, buffers, e.g. pipeline datanodes, are just copied over 
and that seems unnecessary. Immutability is a good alternative to deal with 
concurrency and should be used where possible. (Yes, I know, it's not that easy 
to spread that awareness in Java, i.e. there's no good implicit type for 
immutable data, which would be easier in the functional world). 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] duongkame commented on pull request #4194: HDDS-7734. Implement symmetric SecretKeys lifescycle management in SCM

Reply via email to