Repository: asterixdb Updated Branches: refs/heads/master cb3ca25f3 -> 8bbf08131
[ASTERIXDB-2455][DOC] Deprecate AQL documentations - user model changes: no - storage format changes: no - interface changes: no details: - Create [Deprecated] section and move AQL docs to there. - Move some docs from /aql directory to /sqlpp directory. Change-Id: I677dd7a8d114197eaa2ae93e0405184526b31a03 Reviewed-on: https://asterix-gerrit.ics.uci.edu/2977 Sonar-Qube: Jenkins <[email protected]> Reviewed-by: Ian Maxon <[email protected]> Tested-by: Jenkins <[email protected]> Contrib: Jenkins <[email protected]> Integration-Tests: Jenkins <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/asterixdb/repo Commit: http://git-wip-us.apache.org/repos/asf/asterixdb/commit/8bbf0813 Tree: http://git-wip-us.apache.org/repos/asf/asterixdb/tree/8bbf0813 Diff: http://git-wip-us.apache.org/repos/asf/asterixdb/diff/8bbf0813 Branch: refs/heads/master Commit: 8bbf08131bd679baab5aea11cd860c420bbc9216 Parents: cb3ca25 Author: Taewoo Kim <[email protected]> Authored: Mon Sep 24 17:34:50 2018 -0700 Committer: Taewoo Kim <[email protected]> Committed: Mon Sep 24 19:13:45 2018 -0700 ---------------------------------------------------------------------- .../src/site/markdown/aql/filters.md | 147 ------------ .../src/site/markdown/aql/fulltext.md | 114 ---------- .../src/site/markdown/aql/similarity.md | 227 ------------------- .../src/site/markdown/sqlpp/filters.md | 147 ++++++++++++ .../src/site/markdown/sqlpp/fulltext.md | 114 ++++++++++ .../src/site/markdown/sqlpp/similarity.md | 227 +++++++++++++++++++ asterixdb/asterix-doc/src/site/site.xml | 22 +- 7 files changed, 499 insertions(+), 499 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/asterixdb/blob/8bbf0813/asterixdb/asterix-doc/src/site/markdown/aql/filters.md ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/markdown/aql/filters.md b/asterixdb/asterix-doc/src/site/markdown/aql/filters.md deleted file mode 100644 index 6b8e00f..0000000 --- a/asterixdb/asterix-doc/src/site/markdown/aql/filters.md +++ /dev/null @@ -1,147 +0,0 @@ -<!-- - ! Licensed to the Apache Software Foundation (ASF) under one - ! or more contributor license agreements. See the NOTICE file - ! distributed with this work for additional information - ! regarding copyright ownership. The ASF licenses this file - ! to you under the Apache License, Version 2.0 (the - ! "License"); you may not use this file except in compliance - ! with the License. You may obtain a copy of the License at - ! - ! http://www.apache.org/licenses/LICENSE-2.0 - ! - ! Unless required by applicable law or agreed to in writing, - ! software distributed under the License is distributed on an - ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - ! KIND, either express or implied. See the License for the - ! specific language governing permissions and limitations - ! under the License. - !--> - -# Filter-Based LSM Index Acceleration - -## <a id="toc">Table of Contents</a> - -* [Motivation](#Motivation) -* [Filters in AsterixDB](#FiltersInAsterixDB) -* [Filters and Merge Policies](#FiltersAndMergePolicies) - -## <a id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> - -Traditional relational databases usually employ conventional index -structures such as B+ trees due to their low read latency. However, -such traditional index structures use in-place writes to perform -updates, resulting in costly random writes to disk. Today's emerging -applications often involve insert-intensive workloads for which the -cost of random writes prohibits efficient ingestion of -data. Consequently, popular NoSQL systems such as Cassandra, HBase, -LevelDB, BigTable, etc. have adopted Log-Structured Merge (LSM) Trees -as their storage structure. LSM-trees avoids the cost of random writes -by batching updates into a component of the index that resides in main -memory -- an *in-memory component*. When the space occupancy of -the in-memory component exceeds a specified threshold, its entries are -*flushed* to disk forming a new component -- a *disk component*. As -disk components accumulate on disk, they are periodically merged -together subject to a *merge policy* that decides when and what to -merge. The benefit of the LSM-trees comes at the cost of possibly -sacrificing read efficiency, but, it has been shown in previous -studies that these inefficiencies can be mostly mitigated. - -AsterixDB has also embraced LSM-trees, not just by using them as -primary indexes, but also by using the same LSM-ification technique -for all of its secondary index structures. In particular, AsterixDB -adopted a generic framework for converting a class of indexes (that -includes conventional B+ trees, R trees, and inverted indexes) into -LSM-based secondary indexes, allowing higher data ingestion rates. In -fact, for certain index structures, our results have shown that using -an LSM-based version of an index can be made to significantly -outperform its conventional counterpart for *both* ingestion -and query speed (an example of such an index being the R-tree for -spatial data). - -Since an LSM-based index naturally partitions data into multiple disk -components, it is possible, when answering certain queries, to exploit -partitioning to only access some components and safely filter out the -remaining components, thus reducing query times. For instance, -referring to our -[TinySocial](primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB) -example, suppose a user always retrieves tweets from the -`TweetMessages` dataset based on the `send-time` field (e.g., tweets -posted in the last 24 hours). Since there is not a secondary index on -the `send-time` field, the only available option for AsterixDB would -be to scan the whole `TweetMessages` dataset and then apply the -predicate as a post-processing step. However, if disk components of -the primary index were tagged with the minimum and maximum timestamp -values of the objects they contain, we could utilize the tagged -information to directly access the primary index and prune components -that do not match the query predicate. Thus, we could save substantial -cost by avoiding scanning the whole dataset and only access the -relevant components. We simply call such tagging information that are -associated with components, filters. (Note that even if there were a -secondary index on `send-time` field, using filters could save -substantial cost by avoiding accessing the secondary index, followed -by probing the primary index for every fetched entry.) Moreover, the -same filtering technique can also be used with any secondary LSM index -(e.g., an LSM R-tree), in case the query contains multiple predicates -(e.g., spatial and temporal predicates), to obtain similar pruning -power. - -## <a id="FiltersInAsterixDB">Filters in AsterixDB</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> - -We have added support for LSM-based filters to all of AsterixDB's -index types. To enable the use of filters, the user must specify the -filter's key when creating a dataset, as shown below: - -#### Creating a Dataset with a Filter #### - - create dataset Tweets(TweetType) primary key tweetid with filter on send-time; - -Filters can be created on any totally ordered datatype (i.e., any -field that can be indexed using a B+ -tree), such as integers, -doubles, floats, UUIDs, datetimes, etc. - -When a dataset with a filter is created, the name of the filter's key -field is persisted in the `Metadata.Dataset` dataset (which is the metadata -dataset that stores the details of each dataset in an AsterixDB -instance) so that DML operations against the dataset can recognize the -existence of filters and can update them or utilize them -accordingly. Creating a dataset with a filter in AsterixDB implies -that the primary and all secondary indexes of that dataset will -maintain filters on their disk components. Once a filtered dataset is -created, the user can use the dataset normally (just like any other -dataset). AsterixDB will automatically maintain the filters and will -leverage them to efficiently answer queries whenever possible (i.e., -when a query has predicates on the filter's key). - -## <a id="FiltersAndMergePolicies">Filters and Merge Policies</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> - -The AsterixDB default merge policy, the prefix merge policy, relies on -component sizes and the number of components to decide which -components to merge. This merge policy has proven to provide excellent -performance for both ingestion and queries. However, when evaluating -our filtering solution with the prefix policy, we observed a behavior -that can reduce filter effectiveness. In particular, we noticed that -under the prefix merge policy, the disk components of a secondary -index tend to be constantly merged into a single component. This is -because the prefix policy relies on a single size parameter for all of -the indexes of a dataset. This parameter is typically chosen based on -the sizes of the disk components of the primary index, which tend to -be much larger than the sizes of the secondary indexes' disk -components. This difference caused the prefix merge policy to behave -similarly to the constant merge policy (i.e., relatively poorly) when -applied to secondary indexes in the sense that the secondary indexes -are constantly merged into a single disk component. Consequently, the -effectiveness of filters on secondary indexes was greatly reduced -under the prefix-merge policy, but they were still effective when -probing the primary index. Based on this behavior, we developed a new -merge policy, an improved version of the prefix policy, called the -correlated-prefix policy. The basic idea of this policy is that it -delegates the decision of merging the disk components of all the -indexes in a dataset to the primary index. When the policy decides -that the primary index needs to be merged (using the same decision -criteria as for the prefix policy), then it will issue successive -merge requests to the I/O scheduler on behalf of all other indexes -associated with the same dataset. The end result is that secondary -indexes will always have the same number of disk components as their -primary index under the correlated-prefix merge policy. This has -improved query performance, since disk components of secondary indexes -now have a much better chance of being pruned. http://git-wip-us.apache.org/repos/asf/asterixdb/blob/8bbf0813/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md b/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md deleted file mode 100644 index 1328ed9..0000000 --- a/asterixdb/asterix-doc/src/site/markdown/aql/fulltext.md +++ /dev/null @@ -1,114 +0,0 @@ -<!-- - ! Licensed to the Apache Software Foundation (ASF) under one - ! or more contributor license agreements. See the NOTICE file - ! distributed with this work for additional information - ! regarding copyright ownership. The ASF licenses this file - ! to you under the Apache License, Version 2.0 (the - ! "License"); you may not use this file except in compliance - ! with the License. You may obtain a copy of the License at - ! - ! http://www.apache.org/licenses/LICENSE-2.0 - ! - ! Unless required by applicable law or agreed to in writing, - ! software distributed under the License is distributed on an - ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - ! KIND, either express or implied. See the License for the - ! specific language governing permissions and limitations - ! under the License. - !--> - -# AsterixDB Support of Full-text search queries # - -## <a id="toc">Table of Contents</a> ## - -* [Motivation](#Motivation) -* [Syntax](#Syntax) -* [Creating and utilizing a Full-text index](#FulltextIndex) - -## <a id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -Full-Text Search (FTS) queries are widely used in applications where users need to find records that satisfy -an FTS predicate, i.e., where simple string-based matching is not sufficient. These queries are important when -finding documents that contain a certain keyword is crucial. FTS queries are different from substring matching -queries in that FTS queries find their query predicates as exact keywords in the given string, rather than -treating a query predicate as a sequence of characters. For example, an FTS query that finds ârainâ correctly -returns a document when it contains ârainâ as a word. However, a substring-matching query returns a document -whenever it contains ârainâ as a substring, for instance, a document with âbrainâ or âtrainingâ would be -returned as well. - -## <a id="Syntax">Syntax</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -The syntax of AsterixDB FTS follows a portion of the XQuery FullText Search syntax. -Two basic forms are as follows: - - ftcontains(Expression1, Expression2, {FullTextOption}) - ftcontains(Expression1, Expression2) - -For example, we can execute the following query to find Chirp messages where the `messageText` field includes -âvoiceâ as a word. Please note that an FTS search is case-insensitive. -Thus, "Voice" or "voice" will be evaluated as the same word. - - use TinySocial; - - select element {"chirpId": msg.chirpId} - from ChirpMessages msg - where ftcontains(msg.messageText, "voice", {"mode":"any"}); - -The DDL and DML of TinySocial can be found in [ADM: Modeling Semistructed Data in AsterixDB](../sqlpp/primer-sqlpp.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB). - -The `Expression1` is an expression that should be evaluable as a string at runtime as in the above example -where `msg.messageText` is a string field. The `Expression2` can be a string, an (un)ordered list -of string value(s), or an expression. In the last case, the given expression should be evaluable -into one of the first two types, i.e., into a string value or an (un)ordered list of string value(s). - -The following examples are all valid expressions. - - ... where ftcontains(msg.messageText, "sound") - ... where ftcontains(msg.messageText, "sound", {"mode":"any"}) - ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"any"}) - ... where ftcontains(msg.messageText, {{"speed", "stand", "customization"}}, {"mode":"all"}) - -The last `FullTextOption` parameter clarifies the given FTS request. If you omit the `FullTextOption` parameter, -then the default value will be set for each possible option. Currently, we only have one option named `mode`. -And as we extend the FTS feature, more options will be added. Please note that the format of `FullTextOption` -is a record, thus you need to put the option(s) in a record `{}`. -The `mode` option indicates whether the given FTS query is a conjunctive (AND) or disjunctive (OR) search request. -This option can be either `âallâ` (AND) or `âanyâ` (OR). The default value for `mode` is `âallâ`. If one specifies `âanyâ`, -a disjunctive search will be conducted. For example, the following query will find documents whose `messageText` -field contains âsoundâ or âsystemâ, so a document will be returned if it contains either âsoundâ, âsystemâ, -or both of the keywords. - - ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"any"}) - -The other option parameter,`âallâ`, specifies a conjunctive search. The following examples will find the documents whose -`messageText` field contains both âsoundâ and âsystemâ. If a document contains only âsoundâ or âsystemâ but -not both, it will not be returned. - - ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"all"}) - ... where ftcontains(msg.messageText, ["sound", "system"]) - -Currently AsterixDB doesnât (yet) support phrase searches, so the following query will not work. - - ... where ftcontains(msg.messageText, "sound system", {"mode":"any"}) - -As a workaround solution, the following query can be used to achieve a roughly similar goal. The difference is that -the following queries will find documents where `msg.messageText` contains both âsoundâ and âsystemâ, but the order -and adjacency of âsoundâ and âsystemâ are not checked, unlike in a phrase search. As a result, the query below would -also return documents with âsound system can be installed.â, âsystem sound is perfect.â, -or âsound is not clear. You may need to install a new system.â - - ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"all"}) - ... where ftcontains(msg.messageText, ["sound", "system"]) - - -## <a id="FulltextIndex">Creating and utilizing a Full-text index</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -When there is a full-text index on the field that is being searched, rather than scanning all records, -AsterixDB can utilize that index to expedite the execution of a FTS query. To create a full-text index, -you need to specify the index type as `fulltext` in your DDL statement. For instance, the following DDL -statement create a full-text index on the `GleambookMessages.message` attribute. Note that a full-text index -cannot be built on a dataset with the variable-length primary key (e.g., string). - - use TinySocial; - - create index messageFTSIdx on GleambookMessages(message) type fulltext; http://git-wip-us.apache.org/repos/asf/asterixdb/blob/8bbf0813/asterixdb/asterix-doc/src/site/markdown/aql/similarity.md ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/markdown/aql/similarity.md b/asterixdb/asterix-doc/src/site/markdown/aql/similarity.md deleted file mode 100644 index 8118126..0000000 --- a/asterixdb/asterix-doc/src/site/markdown/aql/similarity.md +++ /dev/null @@ -1,227 +0,0 @@ -<!-- - ! Licensed to the Apache Software Foundation (ASF) under one - ! or more contributor license agreements. See the NOTICE file - ! distributed with this work for additional information - ! regarding copyright ownership. The ASF licenses this file - ! to you under the Apache License, Version 2.0 (the - ! "License"); you may not use this file except in compliance - ! with the License. You may obtain a copy of the License at - ! - ! http://www.apache.org/licenses/LICENSE-2.0 - ! - ! Unless required by applicable law or agreed to in writing, - ! software distributed under the License is distributed on an - ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - ! KIND, either express or implied. See the License for the - ! specific language governing permissions and limitations - ! under the License. - !--> - -# AsterixDB Support of Similarity Queries # - -## <a id="toc">Table of Contents</a> ## - -* [Motivation](#Motivation) -* [Data Types and Similarity Functions](#DataTypesAndSimilarityFunctions) -* [Similarity Selection Queries](#SimilaritySelectionQueries) -* [Similarity Join Queries](#SimilarityJoinQueries) -* [Using Indexes to Support Similarity Queries](#UsingIndexesToSupportSimilarityQueries) - -## <a id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -Similarity queries are widely used in applications where users need to -find objects that satisfy a similarity predicate, while exact matching -is not sufficient. These queries are especially important for social -and Web applications, where errors, abbreviations, and inconsistencies -are common. As an example, we may want to find all the movies -starring Schwarzenegger, while we don't know the exact spelling of his -last name (despite his popularity in both the movie industry and -politics :-)). As another example, we want to find all the Facebook -users who have similar friends. To meet this type of needs, AsterixDB -supports similarity queries using efficient indexes and algorithms. - -## <a id="DataTypesAndSimilarityFunctions">Data Types and Similarity Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -AsterixDB supports [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) (on strings) and -[Jaccard](http://en.wikipedia.org/wiki/Jaccard_index) (on sets). For -instance, in our -[TinySocial](../sqlpp/primer-sqlpp.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB) -example, the `friendIds` of a Gleambook user forms a set -of friends, and we can define a similarity between the sets of -friends of two users. We can also convert a string to a set of grams of a length "n" -(called "n-grams") and define the Jaccard similarity between the two -gram sets of the two strings. Formally, the "n-grams" of a string are -its substrings of length "n". For instance, the 3-grams of the string -`schwarzenegger` are `sch`, `chw`, `hwa`, ..., `ger`. - -AsterixDB provides -[tokenization functions](../sqlpp/builtins.html#Tokenizing_Functions) -to convert strings to sets, and the -[similarity functions](../sqlpp/builtins.html#Similarity_Functions). - -## <a id="SimilaritySelectionQueries">Similarity Selection Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -The following query -asks for all the Gleambook users whose name is similar to -`Suzanna Tilson`, i.e., their edit distance is at most 2. - - use TinySocial; - - select u - from GleambookUsers u - where edit_distance(u.name, "Suzanna Tilson") <= 2; - -The following query -asks for all the Gleambook users whose set of friend ids is -similar to `[1,5,9,10]`, i.e., their Jaccard similarity is at least 0.6. - - use TinySocial; - - select u - from GleambookUsers u - where similarity_jaccard(u.friendIds, [1,5,9,10]) >= 0.6f; - -AsterixDB allows a user to use a similarity operator `~=` to express a -condition by defining the similarity function and threshold -using "set" statements earlier. For instance, the above query can be -equivalently written as: - - use TinySocial; - - set simfunction "jaccard"; - set simthreshold "0.6f"; - - select u - from GleambookUsers u - where u.friendIds ~= [1,5,9,10]; - -In this query, we first declare Jaccard as the similarity function -using `simfunction` and then specify the threshold `0.6f` using -`simthreshold`. - -## <a id="SimilarityJoinQueries">Similarity Join Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -AsterixDB supports fuzzy joins between two sets. The following -[query](../sqlpp/primer-sqlpp.html#Query_5_-_Fuzzy_Join) -finds, for each Gleambook user, all Chirp users with names -similar to their name based on the edit distance. - - use TinySocial; - - set simfunction "edit-distance"; - set simthreshold "3"; - - select gbu.id, gbu.name, (select cu.screenName, cu.name - from ChirpUsers cu - where cu.name ~= gbu.name) as similar_users - from GleambookUsers gbu; - -## <a id="UsingIndexesToSupportSimilarityQueries">Using Indexes to Support Similarity Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## - -AsterixDB uses two types of indexes to support similarity queries, namely -"ngram index" and "keyword index". - -### NGram Index ### - -An "ngram index" is constructed on a set of strings. We generate n-grams for each string, and build an inverted -list for each n-gram that includes the ids of the strings with this -gram. A similarity query can be answered efficiently by accessing the -inverted lists of the grams in the query and counting the number of -occurrences of the string ids on these inverted lists. The similar -idea can be used to answer queries with Jaccard similarity. A -detailed description of these techniques is available at this -[paper](http://www.ics.uci.edu/~chenli/pub/icde2009-memreducer.pdf). - -For instance, the following DDL statements create an ngram index on the -`GleambookUsers.name` attribute using an inverted index of 3-grams. - - use TinySocial; - - create index gbUserIdx on GleambookUsers(name) type ngram(3); - -The number "3" in "ngram(3)" is the length "n" in the grams. This -index can be used to optimize similarity queries on this attribute -using -[edit_distance](../sqlpp/builtins.html#edit_distance), -[edit_distance_check](../sqlpp/builtins.html#edit_distance_check), -[similarity_jaccard](../sqlpp/builtins.html#similarity_jaccard), -or [similarity_jaccard_check](../sqlpp/builtins.html#similarity_jaccard_check) -queries on this attribute where the -similarity is defined on sets of 3-grams. This index can also be used -to optimize queries with the "[contains()]((../sqlpp/builtins.html#contains))" predicate (i.e., substring -matching) since it can be also be solved by counting on the inverted -lists of the grams in the query string. - -#### NGram Index usage case - [edit_distance](../sqlpp/builtins.html#edit-distance) #### - - use TinySocial; - - select u - from GleambookUsers u - where edit_distance(u.name, "Suzanna Tilson") <= 2; - -#### NGram Index usage case - [edit_distance_check](../sqlpp/builtins.html#edit_distance_check) #### - - use TinySocial; - - select u - from GleambookUsers u - where edit_distance_check(u.name, "Suzanna Tilson", 2)[0]; - -#### NGram Index usage case - [contains()]((../sqlpp/builtins.html#contains)) #### - - use TinySocial; - - select m - from GleambookMessages m - where contains(m.message, "phone"); - - -### Keyword Index ### - -A "keyword index" is constructed on a set of strings or sets (e.g., array, multiset). Instead of -generating grams as in an ngram index, we generate tokens (e.g., words) and for each token, construct an inverted list that includes the ids of the -objects with this token. The following two examples show how to create keyword index on two different types: - - -#### Keyword Index on String Type #### - - use TinySocial; - - drop index GleambookMessages.gbMessageIdx if exists; - create index gbMessageIdx on GleambookMessages(message) type keyword; - - select m - from GleambookMessages m - where similarity_jaccard_check(word_tokens(m.message), word_tokens("love like ccast"), 0.2f)[0]; - -#### Keyword Index on Multiset Type #### - - use TinySocial; - - create index gbUserIdxFIds on GleambookUsers(friendIds) type keyword; - - select u - from GleambookUsers u - where similarity_jaccard_check(u.friendIds, {{3,10}}, 0.5f)[0]; - -As shown above, keyword index can be used to optimize queries with token-based similarity predicates, including -[similarity_jaccard](../sqlpp/builtins.html#similarity_jaccard) and -[similarity_jaccard_check](../sqlpp/builtins.html#similarity_jaccard_check). - -#### Keyword Index usage case - [similarity_jaccard](../sqlpp/builtins.html#similarity_jaccard) #### - - use TinySocial; - - select u - from GleambookUsers u - where similarity_jaccard(u.friendIds, [1,5,9,10]) >= 0.6f; - -#### Keyword Index usage case - [similarity_jaccard_check](../sqlpp/builtins.html#similarity_jaccard_check) #### - - use TinySocial; - - select u - from GleambookUsers u - where similarity_jaccard_check(u.friendIds, [1,5,9,10], 0.6f)[0]; - http://git-wip-us.apache.org/repos/asf/asterixdb/blob/8bbf0813/asterixdb/asterix-doc/src/site/markdown/sqlpp/filters.md ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/markdown/sqlpp/filters.md b/asterixdb/asterix-doc/src/site/markdown/sqlpp/filters.md new file mode 100644 index 0000000..6b8e00f --- /dev/null +++ b/asterixdb/asterix-doc/src/site/markdown/sqlpp/filters.md @@ -0,0 +1,147 @@ +<!-- + ! Licensed to the Apache Software Foundation (ASF) under one + ! or more contributor license agreements. See the NOTICE file + ! distributed with this work for additional information + ! regarding copyright ownership. The ASF licenses this file + ! to you under the Apache License, Version 2.0 (the + ! "License"); you may not use this file except in compliance + ! with the License. You may obtain a copy of the License at + ! + ! http://www.apache.org/licenses/LICENSE-2.0 + ! + ! Unless required by applicable law or agreed to in writing, + ! software distributed under the License is distributed on an + ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ! KIND, either express or implied. See the License for the + ! specific language governing permissions and limitations + ! under the License. + !--> + +# Filter-Based LSM Index Acceleration + +## <a id="toc">Table of Contents</a> + +* [Motivation](#Motivation) +* [Filters in AsterixDB](#FiltersInAsterixDB) +* [Filters and Merge Policies](#FiltersAndMergePolicies) + +## <a id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> + +Traditional relational databases usually employ conventional index +structures such as B+ trees due to their low read latency. However, +such traditional index structures use in-place writes to perform +updates, resulting in costly random writes to disk. Today's emerging +applications often involve insert-intensive workloads for which the +cost of random writes prohibits efficient ingestion of +data. Consequently, popular NoSQL systems such as Cassandra, HBase, +LevelDB, BigTable, etc. have adopted Log-Structured Merge (LSM) Trees +as their storage structure. LSM-trees avoids the cost of random writes +by batching updates into a component of the index that resides in main +memory -- an *in-memory component*. When the space occupancy of +the in-memory component exceeds a specified threshold, its entries are +*flushed* to disk forming a new component -- a *disk component*. As +disk components accumulate on disk, they are periodically merged +together subject to a *merge policy* that decides when and what to +merge. The benefit of the LSM-trees comes at the cost of possibly +sacrificing read efficiency, but, it has been shown in previous +studies that these inefficiencies can be mostly mitigated. + +AsterixDB has also embraced LSM-trees, not just by using them as +primary indexes, but also by using the same LSM-ification technique +for all of its secondary index structures. In particular, AsterixDB +adopted a generic framework for converting a class of indexes (that +includes conventional B+ trees, R trees, and inverted indexes) into +LSM-based secondary indexes, allowing higher data ingestion rates. In +fact, for certain index structures, our results have shown that using +an LSM-based version of an index can be made to significantly +outperform its conventional counterpart for *both* ingestion +and query speed (an example of such an index being the R-tree for +spatial data). + +Since an LSM-based index naturally partitions data into multiple disk +components, it is possible, when answering certain queries, to exploit +partitioning to only access some components and safely filter out the +remaining components, thus reducing query times. For instance, +referring to our +[TinySocial](primer.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB) +example, suppose a user always retrieves tweets from the +`TweetMessages` dataset based on the `send-time` field (e.g., tweets +posted in the last 24 hours). Since there is not a secondary index on +the `send-time` field, the only available option for AsterixDB would +be to scan the whole `TweetMessages` dataset and then apply the +predicate as a post-processing step. However, if disk components of +the primary index were tagged with the minimum and maximum timestamp +values of the objects they contain, we could utilize the tagged +information to directly access the primary index and prune components +that do not match the query predicate. Thus, we could save substantial +cost by avoiding scanning the whole dataset and only access the +relevant components. We simply call such tagging information that are +associated with components, filters. (Note that even if there were a +secondary index on `send-time` field, using filters could save +substantial cost by avoiding accessing the secondary index, followed +by probing the primary index for every fetched entry.) Moreover, the +same filtering technique can also be used with any secondary LSM index +(e.g., an LSM R-tree), in case the query contains multiple predicates +(e.g., spatial and temporal predicates), to obtain similar pruning +power. + +## <a id="FiltersInAsterixDB">Filters in AsterixDB</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> + +We have added support for LSM-based filters to all of AsterixDB's +index types. To enable the use of filters, the user must specify the +filter's key when creating a dataset, as shown below: + +#### Creating a Dataset with a Filter #### + + create dataset Tweets(TweetType) primary key tweetid with filter on send-time; + +Filters can be created on any totally ordered datatype (i.e., any +field that can be indexed using a B+ -tree), such as integers, +doubles, floats, UUIDs, datetimes, etc. + +When a dataset with a filter is created, the name of the filter's key +field is persisted in the `Metadata.Dataset` dataset (which is the metadata +dataset that stores the details of each dataset in an AsterixDB +instance) so that DML operations against the dataset can recognize the +existence of filters and can update them or utilize them +accordingly. Creating a dataset with a filter in AsterixDB implies +that the primary and all secondary indexes of that dataset will +maintain filters on their disk components. Once a filtered dataset is +created, the user can use the dataset normally (just like any other +dataset). AsterixDB will automatically maintain the filters and will +leverage them to efficiently answer queries whenever possible (i.e., +when a query has predicates on the filter's key). + +## <a id="FiltersAndMergePolicies">Filters and Merge Policies</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> + +The AsterixDB default merge policy, the prefix merge policy, relies on +component sizes and the number of components to decide which +components to merge. This merge policy has proven to provide excellent +performance for both ingestion and queries. However, when evaluating +our filtering solution with the prefix policy, we observed a behavior +that can reduce filter effectiveness. In particular, we noticed that +under the prefix merge policy, the disk components of a secondary +index tend to be constantly merged into a single component. This is +because the prefix policy relies on a single size parameter for all of +the indexes of a dataset. This parameter is typically chosen based on +the sizes of the disk components of the primary index, which tend to +be much larger than the sizes of the secondary indexes' disk +components. This difference caused the prefix merge policy to behave +similarly to the constant merge policy (i.e., relatively poorly) when +applied to secondary indexes in the sense that the secondary indexes +are constantly merged into a single disk component. Consequently, the +effectiveness of filters on secondary indexes was greatly reduced +under the prefix-merge policy, but they were still effective when +probing the primary index. Based on this behavior, we developed a new +merge policy, an improved version of the prefix policy, called the +correlated-prefix policy. The basic idea of this policy is that it +delegates the decision of merging the disk components of all the +indexes in a dataset to the primary index. When the policy decides +that the primary index needs to be merged (using the same decision +criteria as for the prefix policy), then it will issue successive +merge requests to the I/O scheduler on behalf of all other indexes +associated with the same dataset. The end result is that secondary +indexes will always have the same number of disk components as their +primary index under the correlated-prefix merge policy. This has +improved query performance, since disk components of secondary indexes +now have a much better chance of being pruned. http://git-wip-us.apache.org/repos/asf/asterixdb/blob/8bbf0813/asterixdb/asterix-doc/src/site/markdown/sqlpp/fulltext.md ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/markdown/sqlpp/fulltext.md b/asterixdb/asterix-doc/src/site/markdown/sqlpp/fulltext.md new file mode 100644 index 0000000..1328ed9 --- /dev/null +++ b/asterixdb/asterix-doc/src/site/markdown/sqlpp/fulltext.md @@ -0,0 +1,114 @@ +<!-- + ! Licensed to the Apache Software Foundation (ASF) under one + ! or more contributor license agreements. See the NOTICE file + ! distributed with this work for additional information + ! regarding copyright ownership. The ASF licenses this file + ! to you under the Apache License, Version 2.0 (the + ! "License"); you may not use this file except in compliance + ! with the License. You may obtain a copy of the License at + ! + ! http://www.apache.org/licenses/LICENSE-2.0 + ! + ! Unless required by applicable law or agreed to in writing, + ! software distributed under the License is distributed on an + ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ! KIND, either express or implied. See the License for the + ! specific language governing permissions and limitations + ! under the License. + !--> + +# AsterixDB Support of Full-text search queries # + +## <a id="toc">Table of Contents</a> ## + +* [Motivation](#Motivation) +* [Syntax](#Syntax) +* [Creating and utilizing a Full-text index](#FulltextIndex) + +## <a id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +Full-Text Search (FTS) queries are widely used in applications where users need to find records that satisfy +an FTS predicate, i.e., where simple string-based matching is not sufficient. These queries are important when +finding documents that contain a certain keyword is crucial. FTS queries are different from substring matching +queries in that FTS queries find their query predicates as exact keywords in the given string, rather than +treating a query predicate as a sequence of characters. For example, an FTS query that finds ârainâ correctly +returns a document when it contains ârainâ as a word. However, a substring-matching query returns a document +whenever it contains ârainâ as a substring, for instance, a document with âbrainâ or âtrainingâ would be +returned as well. + +## <a id="Syntax">Syntax</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +The syntax of AsterixDB FTS follows a portion of the XQuery FullText Search syntax. +Two basic forms are as follows: + + ftcontains(Expression1, Expression2, {FullTextOption}) + ftcontains(Expression1, Expression2) + +For example, we can execute the following query to find Chirp messages where the `messageText` field includes +âvoiceâ as a word. Please note that an FTS search is case-insensitive. +Thus, "Voice" or "voice" will be evaluated as the same word. + + use TinySocial; + + select element {"chirpId": msg.chirpId} + from ChirpMessages msg + where ftcontains(msg.messageText, "voice", {"mode":"any"}); + +The DDL and DML of TinySocial can be found in [ADM: Modeling Semistructed Data in AsterixDB](../sqlpp/primer-sqlpp.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB). + +The `Expression1` is an expression that should be evaluable as a string at runtime as in the above example +where `msg.messageText` is a string field. The `Expression2` can be a string, an (un)ordered list +of string value(s), or an expression. In the last case, the given expression should be evaluable +into one of the first two types, i.e., into a string value or an (un)ordered list of string value(s). + +The following examples are all valid expressions. + + ... where ftcontains(msg.messageText, "sound") + ... where ftcontains(msg.messageText, "sound", {"mode":"any"}) + ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"any"}) + ... where ftcontains(msg.messageText, {{"speed", "stand", "customization"}}, {"mode":"all"}) + +The last `FullTextOption` parameter clarifies the given FTS request. If you omit the `FullTextOption` parameter, +then the default value will be set for each possible option. Currently, we only have one option named `mode`. +And as we extend the FTS feature, more options will be added. Please note that the format of `FullTextOption` +is a record, thus you need to put the option(s) in a record `{}`. +The `mode` option indicates whether the given FTS query is a conjunctive (AND) or disjunctive (OR) search request. +This option can be either `âallâ` (AND) or `âanyâ` (OR). The default value for `mode` is `âallâ`. If one specifies `âanyâ`, +a disjunctive search will be conducted. For example, the following query will find documents whose `messageText` +field contains âsoundâ or âsystemâ, so a document will be returned if it contains either âsoundâ, âsystemâ, +or both of the keywords. + + ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"any"}) + +The other option parameter,`âallâ`, specifies a conjunctive search. The following examples will find the documents whose +`messageText` field contains both âsoundâ and âsystemâ. If a document contains only âsoundâ or âsystemâ but +not both, it will not be returned. + + ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"all"}) + ... where ftcontains(msg.messageText, ["sound", "system"]) + +Currently AsterixDB doesnât (yet) support phrase searches, so the following query will not work. + + ... where ftcontains(msg.messageText, "sound system", {"mode":"any"}) + +As a workaround solution, the following query can be used to achieve a roughly similar goal. The difference is that +the following queries will find documents where `msg.messageText` contains both âsoundâ and âsystemâ, but the order +and adjacency of âsoundâ and âsystemâ are not checked, unlike in a phrase search. As a result, the query below would +also return documents with âsound system can be installed.â, âsystem sound is perfect.â, +or âsound is not clear. You may need to install a new system.â + + ... where ftcontains(msg.messageText, ["sound", "system"], {"mode":"all"}) + ... where ftcontains(msg.messageText, ["sound", "system"]) + + +## <a id="FulltextIndex">Creating and utilizing a Full-text index</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +When there is a full-text index on the field that is being searched, rather than scanning all records, +AsterixDB can utilize that index to expedite the execution of a FTS query. To create a full-text index, +you need to specify the index type as `fulltext` in your DDL statement. For instance, the following DDL +statement create a full-text index on the `GleambookMessages.message` attribute. Note that a full-text index +cannot be built on a dataset with the variable-length primary key (e.g., string). + + use TinySocial; + + create index messageFTSIdx on GleambookMessages(message) type fulltext; http://git-wip-us.apache.org/repos/asf/asterixdb/blob/8bbf0813/asterixdb/asterix-doc/src/site/markdown/sqlpp/similarity.md ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/markdown/sqlpp/similarity.md b/asterixdb/asterix-doc/src/site/markdown/sqlpp/similarity.md new file mode 100644 index 0000000..8118126 --- /dev/null +++ b/asterixdb/asterix-doc/src/site/markdown/sqlpp/similarity.md @@ -0,0 +1,227 @@ +<!-- + ! Licensed to the Apache Software Foundation (ASF) under one + ! or more contributor license agreements. See the NOTICE file + ! distributed with this work for additional information + ! regarding copyright ownership. The ASF licenses this file + ! to you under the Apache License, Version 2.0 (the + ! "License"); you may not use this file except in compliance + ! with the License. You may obtain a copy of the License at + ! + ! http://www.apache.org/licenses/LICENSE-2.0 + ! + ! Unless required by applicable law or agreed to in writing, + ! software distributed under the License is distributed on an + ! "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ! KIND, either express or implied. See the License for the + ! specific language governing permissions and limitations + ! under the License. + !--> + +# AsterixDB Support of Similarity Queries # + +## <a id="toc">Table of Contents</a> ## + +* [Motivation](#Motivation) +* [Data Types and Similarity Functions](#DataTypesAndSimilarityFunctions) +* [Similarity Selection Queries](#SimilaritySelectionQueries) +* [Similarity Join Queries](#SimilarityJoinQueries) +* [Using Indexes to Support Similarity Queries](#UsingIndexesToSupportSimilarityQueries) + +## <a id="Motivation">Motivation</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +Similarity queries are widely used in applications where users need to +find objects that satisfy a similarity predicate, while exact matching +is not sufficient. These queries are especially important for social +and Web applications, where errors, abbreviations, and inconsistencies +are common. As an example, we may want to find all the movies +starring Schwarzenegger, while we don't know the exact spelling of his +last name (despite his popularity in both the movie industry and +politics :-)). As another example, we want to find all the Facebook +users who have similar friends. To meet this type of needs, AsterixDB +supports similarity queries using efficient indexes and algorithms. + +## <a id="DataTypesAndSimilarityFunctions">Data Types and Similarity Functions</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +AsterixDB supports [edit distance](http://en.wikipedia.org/wiki/Levenshtein_distance) (on strings) and +[Jaccard](http://en.wikipedia.org/wiki/Jaccard_index) (on sets). For +instance, in our +[TinySocial](../sqlpp/primer-sqlpp.html#ADM:_Modeling_Semistructed_Data_in_AsterixDB) +example, the `friendIds` of a Gleambook user forms a set +of friends, and we can define a similarity between the sets of +friends of two users. We can also convert a string to a set of grams of a length "n" +(called "n-grams") and define the Jaccard similarity between the two +gram sets of the two strings. Formally, the "n-grams" of a string are +its substrings of length "n". For instance, the 3-grams of the string +`schwarzenegger` are `sch`, `chw`, `hwa`, ..., `ger`. + +AsterixDB provides +[tokenization functions](../sqlpp/builtins.html#Tokenizing_Functions) +to convert strings to sets, and the +[similarity functions](../sqlpp/builtins.html#Similarity_Functions). + +## <a id="SimilaritySelectionQueries">Similarity Selection Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +The following query +asks for all the Gleambook users whose name is similar to +`Suzanna Tilson`, i.e., their edit distance is at most 2. + + use TinySocial; + + select u + from GleambookUsers u + where edit_distance(u.name, "Suzanna Tilson") <= 2; + +The following query +asks for all the Gleambook users whose set of friend ids is +similar to `[1,5,9,10]`, i.e., their Jaccard similarity is at least 0.6. + + use TinySocial; + + select u + from GleambookUsers u + where similarity_jaccard(u.friendIds, [1,5,9,10]) >= 0.6f; + +AsterixDB allows a user to use a similarity operator `~=` to express a +condition by defining the similarity function and threshold +using "set" statements earlier. For instance, the above query can be +equivalently written as: + + use TinySocial; + + set simfunction "jaccard"; + set simthreshold "0.6f"; + + select u + from GleambookUsers u + where u.friendIds ~= [1,5,9,10]; + +In this query, we first declare Jaccard as the similarity function +using `simfunction` and then specify the threshold `0.6f` using +`simthreshold`. + +## <a id="SimilarityJoinQueries">Similarity Join Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +AsterixDB supports fuzzy joins between two sets. The following +[query](../sqlpp/primer-sqlpp.html#Query_5_-_Fuzzy_Join) +finds, for each Gleambook user, all Chirp users with names +similar to their name based on the edit distance. + + use TinySocial; + + set simfunction "edit-distance"; + set simthreshold "3"; + + select gbu.id, gbu.name, (select cu.screenName, cu.name + from ChirpUsers cu + where cu.name ~= gbu.name) as similar_users + from GleambookUsers gbu; + +## <a id="UsingIndexesToSupportSimilarityQueries">Using Indexes to Support Similarity Queries</a> <font size="4"><a href="#toc">[Back to TOC]</a></font> ## + +AsterixDB uses two types of indexes to support similarity queries, namely +"ngram index" and "keyword index". + +### NGram Index ### + +An "ngram index" is constructed on a set of strings. We generate n-grams for each string, and build an inverted +list for each n-gram that includes the ids of the strings with this +gram. A similarity query can be answered efficiently by accessing the +inverted lists of the grams in the query and counting the number of +occurrences of the string ids on these inverted lists. The similar +idea can be used to answer queries with Jaccard similarity. A +detailed description of these techniques is available at this +[paper](http://www.ics.uci.edu/~chenli/pub/icde2009-memreducer.pdf). + +For instance, the following DDL statements create an ngram index on the +`GleambookUsers.name` attribute using an inverted index of 3-grams. + + use TinySocial; + + create index gbUserIdx on GleambookUsers(name) type ngram(3); + +The number "3" in "ngram(3)" is the length "n" in the grams. This +index can be used to optimize similarity queries on this attribute +using +[edit_distance](../sqlpp/builtins.html#edit_distance), +[edit_distance_check](../sqlpp/builtins.html#edit_distance_check), +[similarity_jaccard](../sqlpp/builtins.html#similarity_jaccard), +or [similarity_jaccard_check](../sqlpp/builtins.html#similarity_jaccard_check) +queries on this attribute where the +similarity is defined on sets of 3-grams. This index can also be used +to optimize queries with the "[contains()]((../sqlpp/builtins.html#contains))" predicate (i.e., substring +matching) since it can be also be solved by counting on the inverted +lists of the grams in the query string. + +#### NGram Index usage case - [edit_distance](../sqlpp/builtins.html#edit-distance) #### + + use TinySocial; + + select u + from GleambookUsers u + where edit_distance(u.name, "Suzanna Tilson") <= 2; + +#### NGram Index usage case - [edit_distance_check](../sqlpp/builtins.html#edit_distance_check) #### + + use TinySocial; + + select u + from GleambookUsers u + where edit_distance_check(u.name, "Suzanna Tilson", 2)[0]; + +#### NGram Index usage case - [contains()]((../sqlpp/builtins.html#contains)) #### + + use TinySocial; + + select m + from GleambookMessages m + where contains(m.message, "phone"); + + +### Keyword Index ### + +A "keyword index" is constructed on a set of strings or sets (e.g., array, multiset). Instead of +generating grams as in an ngram index, we generate tokens (e.g., words) and for each token, construct an inverted list that includes the ids of the +objects with this token. The following two examples show how to create keyword index on two different types: + + +#### Keyword Index on String Type #### + + use TinySocial; + + drop index GleambookMessages.gbMessageIdx if exists; + create index gbMessageIdx on GleambookMessages(message) type keyword; + + select m + from GleambookMessages m + where similarity_jaccard_check(word_tokens(m.message), word_tokens("love like ccast"), 0.2f)[0]; + +#### Keyword Index on Multiset Type #### + + use TinySocial; + + create index gbUserIdxFIds on GleambookUsers(friendIds) type keyword; + + select u + from GleambookUsers u + where similarity_jaccard_check(u.friendIds, {{3,10}}, 0.5f)[0]; + +As shown above, keyword index can be used to optimize queries with token-based similarity predicates, including +[similarity_jaccard](../sqlpp/builtins.html#similarity_jaccard) and +[similarity_jaccard_check](../sqlpp/builtins.html#similarity_jaccard_check). + +#### Keyword Index usage case - [similarity_jaccard](../sqlpp/builtins.html#similarity_jaccard) #### + + use TinySocial; + + select u + from GleambookUsers u + where similarity_jaccard(u.friendIds, [1,5,9,10]) >= 0.6f; + +#### Keyword Index usage case - [similarity_jaccard_check](../sqlpp/builtins.html#similarity_jaccard_check) #### + + use TinySocial; + + select u + from GleambookUsers u + where similarity_jaccard_check(u.friendIds, [1,5,9,10], 0.6f)[0]; + http://git-wip-us.apache.org/repos/asf/asterixdb/blob/8bbf0813/asterixdb/asterix-doc/src/site/site.xml ---------------------------------------------------------------------- diff --git a/asterixdb/asterix-doc/src/site/site.xml b/asterixdb/asterix-doc/src/site/site.xml index 99947ad..1167c37 100644 --- a/asterixdb/asterix-doc/src/site/site.xml +++ b/asterixdb/asterix-doc/src/site/site.xml @@ -71,36 +71,36 @@ </menu> <menu name = "AsterixDB Primer"> - <item name="Option 1: using SQL++" href="sqlpp/primer-sqlpp.html"/> - <item name="Option 2: using AQL" href="aql/primer.html"/> + <item name="Using SQL++" href="sqlpp/primer-sqlpp.html"/> </menu> <menu name="Data Model"> <item name="The Asterix Data Model" href="datamodel.html"/> </menu> - <menu name="Queries - SQL++"> + <menu name="Queries"> <item name="The SQL++ Query Language" href="sqlpp/manual.html"/> <item name="Builtin Functions" href="sqlpp/builtins.html"/> </menu> - <menu name="Queries - AQL"> - <item name="The Asterix Query Language (AQL)" href="aql/manual.html"/> - <item name="Builtin Functions" href="aql/builtins.html"/> - </menu> - <menu name="API/SDK"> <item name="HTTP API" href="api.html"/> <item name="CSV Output" href="csv.html"/> </menu> <menu name="Advanced Features"> - <item name="Support of Full-text Queries" href="aql/fulltext.html"/> <item name="Accessing External Data" href="aql/externaldata.html"/> <item name="Support for Data Ingestion" href="feeds/tutorial.html"/> <item name="User Defined Functions" href="udf.html"/> - <item name="Filter-Based LSM Index Acceleration" href="aql/filters.html"/> - <item name="Support of Similarity Queries" href="aql/similarity.html"/> + <item name="Filter-Based LSM Index Acceleration" href="sqlpp/filters.html"/> + <item name="Support of Full-text Queries" href="sqlpp/fulltext.html"/> + <item name="Support of Similarity Queries" href="sqlpp/similarity.html"/> + </menu> + + <menu name="Deprecated"> + <item name="AsterixDB Primer: Using AQL" href="aql/primer.html"/> + <item name="Queries: The Asterix Query Language (AQL)" href="aql/manual.html"/> + <item name="Queries: Builtin Functions (AQL)" href="aql/builtins.html"/> </menu> <menu ref="reports"/>
