ktmud commented on pull request #18181:
URL: https://github.com/apache/superset/pull/18181#issuecomment-1041907138


   Originally I had some reservation on choosing Flask-Caching as the backend 
for key-value storage well, but out of practically it does seem to be the 
easiest way forward. However, I'd still like to respond to the arguments for 
using it just to be prudent:
   
   > It supports expiration automatically. This is important because the nature 
of the state that is stored is different and expires at different times. One 
example would be generating a link for a dashboard state vs transferring state 
between a dashboard and Explore.
   
   Sharable URLs aren't supposed to expire. To minimize disruptions on the user 
side (a URL expiring at an unpredictable time), we'd have to configure an LRU 
cache with a relatively long expiration time (in the range of 6 months or 1 
year). For URLs do not need to be shared but are simply used to transfer states 
between pages, the cache is indeed short-lived therefore can just be 
implemented with Superset's regular cache storage (although we may need to 
change the default cache storage from NullCache to SimpleCache otherwise the 
functionality will not work), or using local or server side session storage.
   
   > Is already a dependency on Superset
   
   SQL database is also already a dependency.
   
   > We could use Redis (with persistence enabled) and have all the benefits of 
memory-based storage. This is the default configuration at Preset.
   
   Even with persistence enabled, Redis is not suitable for places sensitive to 
data losses. You may still have data loss depending on how frequent snapshots 
are created and disaster recovery can be slow as when a Redis server restarts, 
it needs to load the full snapshot in memory.
   
   > It gives the flexibility to use the most appropriate technology for each 
deployment. For deployments that don't use Redis, we could fall back to the 
file system or even provide a database backend to store the data in our 
metadata database. Storing the information in a database table would also 
require implementing expiration management which is automatically provided by 
Flask-Caching and we would probably still need to cache the information because 
of performance reasons.
   
   If we are to provide a database backend, then why not just make it the 
default storage? The only additional work comparing to implementing a custom 
Flask-Caching backend is setting and updating expire date when visit. I doubt 
performance will be an issue with adequate indexing. Most of the time you are 
querying and writing one record after all.
   
   ----
   
   Again, I agree Flask-Caching is the practical choice and could work well if 
configured properly. However, I do believe we should at least remove 
`FileSystemCache` as the default storage. It should not be a recommended 
storage even for single-machine deployments. Plus `UPDATING.md` is very easy to 
miss---people will not read it unless it's a major version bump or they see 
problems. We should probably set it to `NullCache` and throw an exception at 
startup. We can still use `FileSystemCache` (or `SimpleCache`) when 
`FLASK_ENV==development` so someone trying Superset out can still get it up and 
running without changing configs manually:
   
   ```python
   EXPLORE_FORM_DATA_CACHE_CONFIG: CacheConfig = (
     {"CACHE_TYPE": "SimpleCache"}  if DEBUG
     else { "CACHE_TYPE": "NullCache" }
   )
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to