I am in favor of this in general. My only concern is the alias based one. This is the default implementation for token based authentication flows and is possibly more secure than a derby based implementation.
Let's begin with some questions about the use of derby: 1. I believe the data is stored in the clear in files in the directory - is this true? 2. Are all tokens only hashed including SSO cookie tokens? 3. Credential stores can be synchronized across instances by simply copying them - can you do that with persisted derby data as well? If not, what is needed to do this? Copying them may not be something that we need to do if this is only for non-HA but it is something that folks may be doing in the wild. We certainly do that for generic aliases already, so if tokens are in there then they are being sync'd as well. On Mon, Nov 13, 2023 at 8:02 AM Sandor Molnar <smol...@cloudera.com.invalid> wrote: > Hello folks! > > I'm starting this thread because I am convinced we should remove the > following TokenStateService implementations: > - AliasBasedtokenStateService > - ZookeeperTokenStateService > - JournalBasedTokenStateService > > The reason behind this idea for the last two implementations in the above > list is quite simple: > > 1. ZookeeperTokenStateService was our first approach to provide HA support > for Knox Token Integration. However, our internal tests have shown that ZK > is just simply not the right tool for that feature. Eventual consistency is > only one part of this issue (we could make this work with re-tried ZK > queries). Performance-wise ZK proved to be a wrong decision. In our test > environment, where hundreds of tokens were generated in every minute, ZK > was not enough to scale. > > 2. JournalBasedTokensSateService is > 2.1 insecure (it stores plain data on the FS), > 2.2 missing features (no impersonation or SSO Cookie support) > > In the case of the AliasBasedtokenStateService, the reason is not that > simple. It's true, that keystore-related operations are expensive, but the > background thread that actually persists the token state improved a lot in > this respect. However, it's still slow compared to the supported databases > we added for the JDBC implementation when it comes to token verification. > In addition to that, the current implementation creates at least 3 aliases > per token, which makes the __gateway really big in case of lots of tokens. > Even worse, we try to read all tokens into memory from __gateway credential > store in a background thread that also consumes memory, CPU which we could > avoid. > To be honest, I don't see any reason why could not we achieve the same > functionality with a pre-configured Derby database that stores its data in > a dedicated sub-folder within the KNOX_DATA_DIR. This would be the default > choice, so users will still not need to configure everything for the > KnoxToken service even if token state management is enabled. > > We could also write a small KNOX CLI command to migrate existing tokens > from keystores to Derby upon upgrade. > > Advantages of the above: > - only one implementation will be kept (JDBCTokenStateService) which is > proven to be robust enough and can scale well > - easier to maintain the product > - easier to troubleshoot in PROD environments (Derby has very powerful > tools to connect and run SQL queries) > - eliminate background threads which make debugging hard, > resource-consuming, and adds complexity > - the non-desired side effects of reading lots of tokens into memory from > __gateway credential store that may make the > > I'm curious about what you think of the above and I'd like to hear back > from you with your suggestions and ideas. > > Cheers, > Sandor >