alexandrejuma opened a new issue #8731: Advanced Superset cache management
URL: https://github.com/apache/incubator-superset/issues/8731
 
 
   **Is your feature request related to a problem? Please describe.**
   
   Superset cache management appears to follow a traditional strategy of 
time-based stale cache mechanism and also with a proactive time-based cache 
warm-up mechanism so when users open some dashboard/slice, the results are 
already pre-cached.
   
   We have a use-case where we pre-process a number of heavy near-real-time 
aggregations using Kafka Streams and we wish to push these results directly to 
our Redis cluster, which would actually be the cache store for Superset, thus 
bypassing the need for Superset to proactively query the data source to refresh 
the cache.
   
   We know that pre-processing and storing it in a supported fast layer for 
Superset to update its cache from, even with short cache time-out periods, is 
something that is feasible, but there's always some query delay no matter how 
fast the serving layer is and the time synchronization of the stream-processing 
applications producing/storing results and the superset cache mechanism 
refreshing its data (it's interval based).
   
   Our requirements is to have our monitoring solution updated every single 
minute (aggregations will have 5m buckets in a sliding window updated every 1m).
   
   **Describe the solution you'd like**
   
   - Be able to manage Superset cache directly (I.e: push pre-processed results 
directly to the cache)
   - Be able to push notifications to Superset cache manager so it can come and 
refresh its data (instead of just time based cache staling/refresh mechanism)
   
   **Describe alternatives you've considered**
   
   Because I'm not sure what I just said makes any kind of sense for Superset 
roadmap, we're also working on the following solutions:
   - Testing Druid ingesting kafka bound raw-data without any pre-processing 
and leverage standard Druid ingest level pre-aggregation
   - Testing Druid ingesting kafka bound pre-processed aggregations (with kafka 
streams) 
   
   Then we'd leverage Superset -> Druid integration and regular cache mechanism 
to provide results, assuming Superset can deliver valid cache results while the 
warmup mechanism updates the cache proactively (i.e: don't block cache hits 
while it's working), it should also be transparent to the user in terms of 
loading times. 
   
   I think the objective is to avoid any user request going directly to the 
data source (in this case Druid).
   
   **Additional context**
   
   Our solution, besides having a very short refresh rate (every 1m), has to be 
able to load quite the number of visualizations simultaneously (think of an 
operations center) and with high concurrency (lots of users). We're thinking on 
leveraging Superset embeddable visualizations on a 3d party application and 
leverage its caching mechanism to support blazing fast experience,.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org
For additional commands, e-mail: notifications-h...@superset.apache.org

Reply via email to