michael-s-molina commented on pull request #18181:
URL: https://github.com/apache/superset/pull/18181#issuecomment-1041424136


   Thanks for your comments @john-bodley @nytai @ktmud. Let me start with this: 
   
   > 4. I realize that the ship has somewhat sailed, but per 
[SIP-0](https://github.com/apache/superset/issues/5602) it mentions,
   > 
   > > Any of the following should be considered a major change:
   > > 
   > > * Any major new feature, subsystem, or piece of functionality
   > > * **Any change that impacts the public interfaces of the project**
   > > 
   > > What are the "public interfaces" of the project? All of the following 
are public interfaces that people build around:
   > > 
   > > * Visualization types
   > > * **Form data for saved dashboards and charts**
   > > * ...
   > 
   > which would indicate that this feature likely should have been presented 
as a SIP to allow for discussion regarding design/implementation and sign off 
from the community.
   
   You're totally right about this. We did a fairly extensive architectural 
discussion internally at Preset where we created a document with all the 
architectural options, discussed all the possibilities, and presented the 
recommended approach. We should have published the result of this work as a SIP 
for community discussion and sign-off. I apologize for that.
   
   > 1. It seems like using a (semi-)persistent store for tackling the problem 
of long URLs seems someone akin to using a hammer to crack a nut—even though it 
works there may likely be simpler methods which don't require persistence of 
state. For example was the ideal of compressing the JSON form data considered? 
This would likely remedy the problem in far simpler way whilst easily 
maintaining backwards compatibility etc.
   
   Compressing JSON form data was the first option considered to resolve the 
problem. It was discarded because of the size of the dashboard and Explore 
states. Even with sophisticated compression algorithms, it was really easy to 
create a dashboard with many filter values or a chart with many configurations 
that exceeded the browser URL limit.
   
   > 2. Flask caching (regardless of the backing store) is primarily used as a 
cache—ideally with TTL functionality to prevent overflow. Using a cache for 
(semi-)persistent storage seems somewhat of an anti-pattern and likely violates 
the implicit understanding of how Superset's cache is used, i.e., in theory you 
should be able to obliterate the entire cache without impacting the 
_functionality_ of the application (_performance_ will clearly be degraded). If 
we plan on using a store to remedy the problem we likely should lean on 
Superset's metadata database—augmenting the shortened URL functionality.
   
   During the evaluation of the architectural options, we considered a database 
table, Flask-Caching, and even DynamoDB or MongoDB to store this type of 
information. We listed pros and cons for each option and ultimately decided to 
use Flask-Caching for some reasons:
   - It supports expiration automatically. This is important because the nature 
of the state that is stored is different and expires at different times. One 
example would be generating a link for a dashboard state vs transferring state 
between a dashboard and Explore.
   - Is already a dependency on Superset
   - We could use Redis (with persistence enabled) and have all the benefits of 
memory-based storage. This is the default configuration at Preset.
   - It gives the flexibility to use the most appropriate technology for each 
deployment. For deployments that don't use Redis, we could fall back to the 
file system or even provide a database backend to store the data in our 
metadata database.
   - Storing the information in a database table would also require 
implementing expiration management which is automatically provided by 
Flask-Caching and we would probably still need to cache the information because 
of performance reasons.
   
   > 3. It seems like there are aspects of this change which are still in flux, 
i.e., questions regarding backwards compatibility, state changes, etc. and thus 
I would consider this experimental in nature and likely should have been behind 
a feature flag (and defaulted to disabled) whilst the design/implementation was 
hardened.
   
   We didn't consider this an experimental feature, in fact we considered it as 
a fix to a long-standing problem that was affecting the users. We just did 2 
improvements after this PR: one to [generate the keys more 
efficiently](https://github.com/apache/superset/pull/18576) and another to 
[better deal with expired keys](https://github.com/apache/superset/pull/18624).
   
   > Particularly troubling is that the files system cache seems to be the 
default, so users going to upgrade superset would likely discover this issue 
after deploying. 
   
   That was considered during the work and we added the following in 
`UPDATING.MD`:
   
   * [17536](https://github.com/apache/superset/pull/17536): introduced a 
key-value endpoint to store dashboard filter state. This endpoint is backed by 
Flask-Caching and the default configuration assumes that the values will be 
stored in the file system. If you are already using another cache backend like 
Redis or Memchached, you'll probably want to change this setting in 
superset_config.py. The key is FILTER_STATE_CACHE_CONFIG and the available 
settings can be found in Flask-Caching 
[docs](https://flask-caching.readthedocs.io/en/latest/).
   * [17882](https://github.com/apache/superset/pull/17882): introduced a 
key-value endpoint to store Explore form data. This endpoint is backed by 
Flask-Caching and the default configuration assumes that the values will be 
stored in the file system. If you are already using another cache backend like 
Redis or Memchached, you'll probably want to change this setting in 
superset_config.py. The key is EXPLORE_FORM_DATA_CACHE_CONFIG and the available 
settings can be found in Flask-Caching 
[docs](https://flask-caching.readthedocs.io/en/latest/).
   
   > Certain deployment setups involve running a single pod/host for 
dev/test/staging envs and multiple pods for production, so having the file 
system cache be the default would not present any issues in a single pod/host 
environment but would be a huge issue in multi-pod/multi-host environment. The 
form data request could get routed to a pod/host that does not have the data in 
file system cache.
   
   When defining the default backend for Flask-Caching, we considered that the 
default configuration should account for single pod/host because this is the 
most common scenario. We also didn't want to introduce more configuration 
burden for the default deployment, that's why we opted for the file system. 
Multi-pod/multi-host environments should change the default configuration as 
stated in `UPDATING.MD`.
   
   I’m sorry about this having caused problems in your deployments. I agree 
this should have been surfaced as a SIP to gather all relevant context from the 
early discussions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to