[GitHub] [superset] john-bodley opened a new issue #15396: Memoization with Locking

GitBox Fri, 25 Jun 2021 12:40:08 -0700


john-bodley opened a new issue #15396:
URL: https://github.com/apache/superset/issues/15396



   **Is your feature request related to a problem? Please describe.**
   
   As mentioned in the Apache Superset Contributor Meetup on 
[2021-06-25](https://docs.google.com/document/d/1xRLqmUn-G7WiPe8qZnfuvbrYHZ1jmk9aSv9Bhjis2tg/edit#heading=h.uh575p50wagev)
 at Airbnb we implemented a custom memoization function 
(`memoize_with_user_lock`) which on a per-user basis provides global locking 
(based on a cache key) to reduce the Presto query load—as it relates to the 
`latest_partition` call. This ensures that numerous instances of the same query 
are not simultaneously sent to the underlying engine (which may or may not 
dedupe).
   
   Though the cache is now at the per user level to prevent deadlocks 
(necessary for our implementation given our Presto cluster queue is faceted by 
user and thus there's no guarantee that the queries will be executed globally 
in a FIFO manner) we noticed a significant decrease in the number of queries 
when users invoked dashboard filters. 
   
   **Describe the solution you'd like**
   
   I'm not saying this solution will work globally, but I felt there was merit 
in sharing the code. Note this is implemented for Redis which provides locking.
   
   ```python
   from functools import wraps
   from typing import Any, Callable, Optional
   
   from flask import g
   from redis.lock import Lock
   
   from superset import cache_manager
   
   
   def memoize_with_user_lock(
       timeout: Optional[int] = None,
   ) -> Callable[..., Any]:
       """
       Decorator for memoization which leverages per user global locking to 
prevent
       simultaneous execution.
   
       :param timeout: If set, will cache for that amount of time (in seconds)
       :returns: A memoize decorator
       """
   
       def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
           @wraps(func)
           def wrapper(*args: Any, **kwargs: Any) -> Callable[..., Any]:
               """
               Cache the result of a function, via memoization, taking its 
arguments and
               the user into account in the cache key.
   
               Rather than re-implementing the somewhat non-trivial 
Flask-Caching
               memoization logic we simply re-wrap the memoized function and 
serialize the
               calls if uncached to prevent simultaneous execution.
   
               :see: https://github.com/sh4nks/flask-caching
               """
   
               @cache_manager.data_cache.memoize(
                   make_name=lambda fname: str((fname, g.user.username)),
                   timeout=timeout,
               )
               def memoized_func(*args: Any, **kwargs: Any) -> Callable[..., 
Any]:
                   return func(*args, **kwargs)
   
               # The cache key associated with the non-memoized function, which 
is also
               # used for locking, is additionally keyed by user to prevent 
deadlocks.
               cache_key = memoized_func.make_cache_key(
                   memoized_func.uncached, *args, **kwargs
               )
   
               # First check whether the Flask-Caching memoized function is 
cached. Note
               # the cached value is never `None` by construction.
               rv = cache_manager.data_cache.get(cache_key)
   
               if rv is None:
   
                   # Per user sequential execution of the Flask-Caching 
memoized function.
                   with Lock(
                       cache_manager.data_cache.cache._write_client,  # pylint: 
disable=protected-access
                       cache_key,
                   ):
                       # Calling the Flask-Caching memoized function caches the 
result.
                       rv = memoized_func(*args, **kwargs)
   
               return rv
   
           return wrapper
   
       return decorator
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [superset] john-bodley opened a new issue #15396: Memoization with Locking

Reply via email to