michael-s-molina commented on pull request #18181: URL: https://github.com/apache/superset/pull/18181#issuecomment-1041424136
Thanks for your comments @john-bodley @nytai @ktmud. Let me start with this: > 4. I realize that the ship has somewhat sailed, but per [SIP-0](https://github.com/apache/superset/issues/5602) it mentions, > > > Any of the following should be considered a major change: > > > > * Any major new feature, subsystem, or piece of functionality > > * **Any change that impacts the public interfaces of the project** > > > > What are the "public interfaces" of the project? All of the following are public interfaces that people build around: > > > > * Visualization types > > * **Form data for saved dashboards and charts** > > * ... > > which would indicate that this feature likely should have been presented as a SIP to allow for discussion regarding design/implementation and sign off from the community. You're totally right about this. We did a fairly extensive architectural discussion internally at Preset where we created a document with all the architectural options, discussed all the possibilities, and presented the recommended approach. We should have published the result of this work as a SIP for community discussion and sign-off. I apologize for that. > 1. It seems like using a (semi-)persistent store for tackling the problem of long URLs seems someone akin to using a hammer to crack a nut—even though it works there may likely be simpler methods which don't require persistence of state. For example was the ideal of compressing the JSON form data considered? This would likely remedy the problem in far simpler way whilst easily maintaining backwards compatibility etc. Compressing JSON form data was the first option considered to resolve the problem. It was discarded because of the size of the dashboard and Explore states. Even with sophisticated compression algorithms, it was really easy to create a dashboard with many filter values or a chart with many configurations that exceeded the browser URL limit. > 2. Flask caching (regardless of the backing store) is primarily used as a cache—ideally with TTL functionality to prevent overflow. Using a cache for (semi-)persistent storage seems somewhat of an anti-pattern and likely violates the implicit understanding of how Superset's cache is used, i.e., in theory you should be able to obliterate the entire cache without impacting the _functionality_ of the application (_performance_ will clearly be degraded). If we plan on using a store to remedy the problem we likely should lean on Superset's metadata database—augmenting the shortened URL functionality. During the evaluation of the architectural options, we considered a database table, Flask-Caching, and even DynamoDB or MongoDB to store this type of information. We listed pros and cons for each option and ultimately decided to use Flask-Caching for some reasons: - It supports expiration automatically. This is important because the nature of the state that is stored is different and expires at different times. One example would be generating a link for a dashboard state vs transferring state between a dashboard and Explore. - Is already a dependency on Superset - We could use Redis (with persistence enabled) and have all the benefits of memory-based storage. This is the default configuration at Preset. - It gives the flexibility to use the most appropriate technology for each deployment. For deployments that don't use Redis, we could fall back to the file system or even provide a database backend to store the data in our metadata database. - Storing the information in a database table would also require implementing expiration management which is automatically provided by Flask-Caching and we would probably still need to cache the information because of performance reasons. > 3. It seems like there are aspects of this change which are still in flux, i.e., questions regarding backwards compatibility, state changes, etc. and thus I would consider this experimental in nature and likely should have been behind a feature flag (and defaulted to disabled) whilst the design/implementation was hardened. We didn't consider this an experimental feature, in fact we considered it as a fix to a long-standing problem that was affecting the users. We just did 2 improvements after this PR: one to [generate the keys more efficiently](https://github.com/apache/superset/pull/18576) and another to [better deal with expired keys](https://github.com/apache/superset/pull/18624). > Particularly troubling is that the files system cache seems to be the default, so users going to upgrade superset would likely discover this issue after deploying. That was considered during the work and we added the following in `UPDATING.MD`: * [17536](https://github.com/apache/superset/pull/17536): introduced a key-value endpoint to store dashboard filter state. This endpoint is backed by Flask-Caching and the default configuration assumes that the values will be stored in the file system. If you are already using another cache backend like Redis or Memchached, you'll probably want to change this setting in superset_config.py. The key is FILTER_STATE_CACHE_CONFIG and the available settings can be found in Flask-Caching [docs](https://flask-caching.readthedocs.io/en/latest/). * [17882](https://github.com/apache/superset/pull/17882): introduced a key-value endpoint to store Explore form data. This endpoint is backed by Flask-Caching and the default configuration assumes that the values will be stored in the file system. If you are already using another cache backend like Redis or Memchached, you'll probably want to change this setting in superset_config.py. The key is EXPLORE_FORM_DATA_CACHE_CONFIG and the available settings can be found in Flask-Caching [docs](https://flask-caching.readthedocs.io/en/latest/). > Certain deployment setups involve running a single pod/host for dev/test/staging envs and multiple pods for production, so having the file system cache be the default would not present any issues in a single pod/host environment but would be a huge issue in multi-pod/multi-host environment. The form data request could get routed to a pod/host that does not have the data in file system cache. When defining the default backend for Flask-Caching, we considered that the default configuration should account for single pod/host because this is the most common scenario. We also didn't want to introduce more configuration burden for the default deployment, that's why we opted for the file system. Multi-pod/multi-host environments should change the default configuration as stated in `UPDATING.MD`. I’m sorry about this having caused problems in your deployments. I agree this should have been surfaced as a SIP to gather all relevant context from the early discussions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
