Agreed! On Tue, Nov 14, 2023 at 7:15 AM Sandor Molnar <smol...@cloudera.com.invalid> wrote:
> Thanks, Sandeep for your insights too! > > To be honest, I would be surprised if someone used the JournalBased or the > ZK TokenStateService implementations for storing tokens (even in the wild). > In the case of AliasBasedTSS, it might be true. That's why I suggested that > we could write a small KNOX CLI command that migrates tokens from the > __gateway.jks credential store into the new Derby Database. > We could even make this an auto-executed step: when token state service is > initiated and it's using the pre-configured Derby database, we may check if > there is any tokens stored in __gateway.jks and migrate them. This way > it'd be seamless for existing tokens. > I'm not sure that the migration tool should handle ZK and/or files. If I > had to vote now, I'd say no, it's not necessary. > > Sandor > > On Tue, Nov 14, 2023 at 1:05 PM Sandeep Moré <moresand...@gmail.com> > wrote: > > > Thank you for starting the thread Sandor. You bring up valid points and > > pain that we had to go through with these implementations :) > > I am in favor of removing these as well. My concern is around folks > > currently using any of these impls. Should we deprecate it this release > > (but keep the support and make JDBCTokenStateService default) and remove > > the support completely in the next release? This is because this is a > > breaking change. > > > > My thoughts. > > > > On Mon, Nov 13, 2023 at 8:02 AM Sandor Molnar > <smol...@cloudera.com.invalid > > > > > wrote: > > > > > Hello folks! > > > > > > I'm starting this thread because I am convinced we should remove the > > > following TokenStateService implementations: > > > - AliasBasedtokenStateService > > > - ZookeeperTokenStateService > > > - JournalBasedTokenStateService > > > > > > The reason behind this idea for the last two implementations in the > above > > > list is quite simple: > > > > > > 1. ZookeeperTokenStateService was our first approach to provide HA > > support > > > for Knox Token Integration. However, our internal tests have shown that > > ZK > > > is just simply not the right tool for that feature. Eventual > consistency > > is > > > only one part of this issue (we could make this work with re-tried ZK > > > queries). Performance-wise ZK proved to be a wrong decision. In our > test > > > environment, where hundreds of tokens were generated in every minute, > ZK > > > was not enough to scale. > > > > > > 2. JournalBasedTokensSateService is > > > 2.1 insecure (it stores plain data on the FS), > > > 2.2 missing features (no impersonation or SSO Cookie support) > > > > > > In the case of the AliasBasedtokenStateService, the reason is not that > > > simple. It's true, that keystore-related operations are expensive, but > > the > > > background thread that actually persists the token state improved a lot > > in > > > this respect. However, it's still slow compared to the supported > > databases > > > we added for the JDBC implementation when it comes to token > verification. > > > In addition to that, the current implementation creates at least 3 > > aliases > > > per token, which makes the __gateway really big in case of lots of > > tokens. > > > Even worse, we try to read all tokens into memory from __gateway > > credential > > > store in a background thread that also consumes memory, CPU which we > > could > > > avoid. > > > To be honest, I don't see any reason why could not we achieve the same > > > functionality with a pre-configured Derby database that stores its data > > in > > > a dedicated sub-folder within the KNOX_DATA_DIR. This would be the > > default > > > choice, so users will still not need to configure everything for the > > > KnoxToken service even if token state management is enabled. > > > > > > We could also write a small KNOX CLI command to migrate existing tokens > > > from keystores to Derby upon upgrade. > > > > > > Advantages of the above: > > > - only one implementation will be kept (JDBCTokenStateService) which is > > > proven to be robust enough and can scale well > > > - easier to maintain the product > > > - easier to troubleshoot in PROD environments (Derby has very powerful > > > tools to connect and run SQL queries) > > > - eliminate background threads which make debugging hard, > > > resource-consuming, and adds complexity > > > - the non-desired side effects of reading lots of tokens into memory > from > > > __gateway credential store that may make the > > > > > > I'm curious about what you think of the above and I'd like to hear back > > > from you with your suggestions and ideas. > > > > > > Cheers, > > > Sandor > > > > > >