Taewoo Kim has posted comments on this change. Change subject: Fulltext search initial implementation ......................................................................
Patch Set 5: (1 comment) https://asterix-gerrit.ics.uci.edu/#/c/989/5/asterixdb/asterix-doc/src/site/markdown/aql/manual.md File asterixdb/asterix-doc/src/site/markdown/aql/manual.md: Line 720: `rtree` for spatial data, and `keyword`, `ngram`, and `fulltext` for textual (string) data. > what is the different between `keyword` and `fulltext`? The keyword index is length partitioned index, while the full-text index is a single partitioned index. For length partitioned index, we build an index by firstly clustering the field which has the same length (= number of tokens) then tokenize the word and store them. So, the representation would be [7][president][PK1]. Here, 7 is the number of tokens in the indexed field for the record where PK is PK1. Token "president" is a word token. This is needed to calculate the similarity fast since calculating the similarity has lower and higher bound. But, for the full-text search, this not required. Actually, this "partition" feature should be disabled. So, for full-text index, we just store [president][PK1]. -- To view, visit https://asterix-gerrit.ics.uci.edu/989 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I71887c2ea847e4488f4c98a11f8a5bcad02cac5a Gerrit-PatchSet: 5 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: Taewoo Kim <[email protected]> Gerrit-Reviewer: Heri Ramampiaro <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Jianfeng Jia <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Taewoo Kim <[email protected]> Gerrit-Reviewer: Till Westmann <[email protected]> Gerrit-HasComments: Yes
