Re: [VOTE][APE] Query Plan Cache

Murtadha Al-Hubail Fri, 08 Dec 2023 11:34:52 -0800

+1 to this proposal in general. However, I believe there are few
implementation details that need to be considered:


1. We should have a cluster-wide configuration to completely disable/enable
query plan caching.
2. Having compiler.querycache.clear as a SET statement might not be ideal
for a multi-user environment. You don't want any user to just clear the
cache for all other users. It might be better to have this as a part of an
admin REST API that can be easily authenticated.
3. We will need to understand the memory footprint of this cache to
properly set a default value for its capacity or potentially set the
capacity/limit dynamically based on the available JVM memory and the
cache's footprint.

Cheers,
Murtadha

On Fri, Dec 8, 2023 at 9:15 PM Till Westmann <ti...@apache.org> wrote:

> +1
>
> On 2023/12/07 22:37:55 Mike Carey wrote:
> > +1
> >
> > We really need this!
> >
> > On 12/7/23 2:21 PM, Glenn Justo Galvizo wrote:
> > > Every time a query is issued to AsterixDB, the query must undergo
> compilation. If the same query is run repeatedly, this query must be
> recompiled each and every time. A query plan cache can help AsterixDB
> achieve a lower floor on the end-to-end time by storing the job
> specifications for previously compiled queries, ultimately skipping the AST
> rewriting and Algebricks compilation of a previously executed query.
> > >
> > > (APE copied from contributor Sushrut Borkor)
> > >
> > > This APE is about adding a query plan cache to AsterixDB. More
> specifically, this query plan cache acts as a hash table that skips 1) the
> AST rewriting, 2) the entire Algebricks plan translation to Algebricks
> optimization, and 3) the Hyracks job generation. The keys of this hash
> table are:
> > >      • AST String. We cache this instead of the original query string
> before parsing because it is resilient to minor changes in the query, such
> as adding spaces or empty lines.
> > >      • SessionConfig. For example, if the user runs a query, changes
> part of the session configuration (e.g. the preferred output format), and
> reruns the query, this prevents the second query from being served from the
> cache.
> > >      • Config, to capture the effects of used SET statements.
> > >      • Active Dataverse, e.g., as defined in a USE statement.
> > >      • Result Set ID, which distinguishes among queries in
> multi-statement requests.
> > >
> > > While the values of each hash table entry are:
> > >      • Hyracks Job Spec to be submitted to Hyracks.
> > >      • Cached warnings. Since we skip compilation when serving queries
> from the cache, we cannot detect compile time warnings. To get around this,
> we cache warnings issued during rewriting and compilation, and then reissue
> them for cache hits. As a result, line numbers in warnings may be incorrect
> for queries answered using the cache.
> > >      • Lock. Since running the same job from multiple threads does not
> work, we include a lock in the cache value. To use a cached job spec, a
> thread must acquire this lock, and then release it after the job has
> finished running. If the lock is held by another thread, we recompile the
> query instead of blocking.
> > >
> > > The proposed changes are the following:
> > >
> > > Interface:
> > > We introduce two new statements for controlling cache access:
> > >      • “SET `compiler.querycache.bypass` "true";” forces the current
> query to ignore the cache.
> > >      • “SET `compiler.querycache.clear` "true";” clears all cache
> entries. The current query may still insert into the cache.
> > > We also add a boolean HTTP API parameter bypass_cache which does the
> same thing as the first SET statement above. Finally, the parameter
> query.cache.capacity can be configured in the [cc] section of the cc.conf
> file to control the maximum cache size before replacement.
> > >
> > > Changes:
> > >      • Compilation logic is changed in the source code since we skip
> rewriting and compilation for cache hits.
> > >      • Hints are now included in the AST string to prevent incorrect
> cache lookups that would otherwise miss the hints.
> > >      • A bug is fixed where the AST string of WINDOW expressions did
> not include FROM LAST or IGNORE NULLS.
> > >
> > > Seehttps://
> issues.apache.org/jira/projects/ASTERIXDB/issues/ASTERIXDB-3183  for the
> JIRA issue, as well ashttps://
> cwiki.apache.org/confluence/display/ASTERIXDB/APE+2%3A+Query+Plan+Cache
> for more details.
> > >
> > > Please vote on this APE. We will keep this open for 72 hours and pass
> with either 3 votes or a majority of positive votes.
>

Re: [VOTE][APE] Query Plan Cache

Reply via email to