ktmud commented on pull request #18181:
URL: https://github.com/apache/superset/pull/18181#issuecomment-1041907138
Originally I had some reservation on choosing Flask-Caching as the backend
for key-value storage well, but out of practically it does seem to be the
easiest way forward. However, I'd still like to respond to the arguments for
using it just to be prudent:
> It supports expiration automatically. This is important because the nature
of the state that is stored is different and expires at different times. One
example would be generating a link for a dashboard state vs transferring state
between a dashboard and Explore.
Sharable URLs aren't supposed to expire. To minimize disruptions on the user
side (a URL expiring at an unpredictable time), we'd have to configure an LRU
cache with a relatively long expiration time (in the range of 6 months or 1
year). For URLs do not need to be shared but are simply used to transfer states
between pages, the cache is indeed short-lived therefore can just be
implemented with Superset's regular cache storage (although we may need to
change the default cache storage from NullCache to SimpleCache otherwise the
functionality will not work), or using local or server side session storage.
> Is already a dependency on Superset
SQL database is also already a dependency.
> We could use Redis (with persistence enabled) and have all the benefits of
memory-based storage. This is the default configuration at Preset.
Even with persistence enabled, Redis is not suitable for places sensitive to
data losses. You may still have data loss depending on how frequent snapshots
are created and disaster recovery can be slow as when a Redis server restarts,
it needs to load the full snapshot in memory.
> It gives the flexibility to use the most appropriate technology for each
deployment. For deployments that don't use Redis, we could fall back to the
file system or even provide a database backend to store the data in our
metadata database. Storing the information in a database table would also
require implementing expiration management which is automatically provided by
Flask-Caching and we would probably still need to cache the information because
of performance reasons.
If we are to provide a database backend, then why not just make it the
default storage? The only additional work comparing to implementing a custom
Flask-Caching backend is setting and updating expire date when visit. I doubt
performance will be an issue with adequate indexing. Most of the time you are
querying and writing one record after all.
----
Again, I agree Flask-Caching is the practical choice and could work well if
configured properly. However, I do believe we should at least remove
`FileSystemCache` as the default storage. It should not be a recommended
storage even for single-machine deployments. Plus `UPDATING.md` is very easy to
miss---people will not read it unless it's a major version bump or they see
problems. We should probably set it to `NullCache` and throw an exception at
startup. We can still use `FileSystemCache` (or `SimpleCache`) when
`FLASK_ENV==development` so someone trying Superset out can still get it up and
running without changing configs manually:
```python
EXPLORE_FORM_DATA_CACHE_CONFIG: CacheConfig = (
{"CACHE_TYPE": "SimpleCache"} if DEBUG
else { "CACHE_TYPE": "NullCache" }
)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]