+1

On 2023/12/07 22:37:55 Mike Carey wrote:
> +1
> 
> We really need this!
> 
> On 12/7/23 2:21 PM, Glenn Justo Galvizo wrote:
> > Every time a query is issued to AsterixDB, the query must undergo 
> > compilation. If the same query is run repeatedly, this query must be 
> > recompiled each and every time. A query plan cache can help AsterixDB 
> > achieve a lower floor on the end-to-end time by storing the job 
> > specifications for previously compiled queries, ultimately skipping the AST 
> > rewriting and Algebricks compilation of a previously executed query.
> >
> > (APE copied from contributor Sushrut Borkor)
> >
> > This APE is about adding a query plan cache to AsterixDB. More 
> > specifically, this query plan cache acts as a hash table that skips 1) the 
> > AST rewriting, 2) the entire Algebricks plan translation to Algebricks 
> > optimization, and 3) the Hyracks job generation. The keys of this hash 
> > table are:
> >      • AST String. We cache this instead of the original query string 
> > before parsing because it is resilient to minor changes in the query, such 
> > as adding spaces or empty lines.
> >      • SessionConfig. For example, if the user runs a query, changes part 
> > of the session configuration (e.g. the preferred output format), and reruns 
> > the query, this prevents the second query from being served from the cache.
> >      • Config, to capture the effects of used SET statements.
> >      • Active Dataverse, e.g., as defined in a USE statement.
> >      • Result Set ID, which distinguishes among queries in multi-statement 
> > requests.
> >
> > While the values of each hash table entry are:
> >      • Hyracks Job Spec to be submitted to Hyracks.
> >      • Cached warnings. Since we skip compilation when serving queries from 
> > the cache, we cannot detect compile time warnings. To get around this, we 
> > cache warnings issued during rewriting and compilation, and then reissue 
> > them for cache hits. As a result, line numbers in warnings may be incorrect 
> > for queries answered using the cache.
> >      • Lock. Since running the same job from multiple threads does not 
> > work, we include a lock in the cache value. To use a cached job spec, a 
> > thread must acquire this lock, and then release it after the job has 
> > finished running. If the lock is held by another thread, we recompile the 
> > query instead of blocking.
> >
> > The proposed changes are the following:
> >
> > Interface:
> > We introduce two new statements for controlling cache access:
> >      • “SET `compiler.querycache.bypass` "true";” forces the current query 
> > to ignore the cache.
> >      • “SET `compiler.querycache.clear` "true";” clears all cache entries. 
> > The current query may still insert into the cache.
> > We also add a boolean HTTP API parameter bypass_cache which does the same 
> > thing as the first SET statement above. Finally, the parameter 
> > query.cache.capacity can be configured in the [cc] section of the cc.conf 
> > file to control the maximum cache size before replacement.
> >
> > Changes:
> >      • Compilation logic is changed in the source code since we skip 
> > rewriting and compilation for cache hits.
> >      • Hints are now included in the AST string to prevent incorrect cache 
> > lookups that would otherwise miss the hints.
> >      • A bug is fixed where the AST string of WINDOW expressions did not 
> > include FROM LAST or IGNORE NULLS.
> >
> > Seehttps://issues.apache.org/jira/projects/ASTERIXDB/issues/ASTERIXDB-3183  
> > for the JIRA issue, as well 
> > ashttps://cwiki.apache.org/confluence/display/ASTERIXDB/APE+2%3A+Query+Plan+Cache
> >   for more details.
> >
> > Please vote on this APE. We will keep this open for 72 hours and pass with 
> > either 3 votes or a majority of positive votes.

Reply via email to