GitHub user chenlica closed a discussion: Keyword Match Operator (from old wiki)
>From the page https://github.com/apache/texera/wiki/Keyword-Match-Operator >(may be dangling) ==== Authors: [Akshay Jain](https://github.com/akshaybetala), [Prakul Agarwal](https://github.com/prakul) Reviewers: [Chen Li](https://github.com/chenlica) ### Synopsis Implement an operator wrapping Lucene's search capability to support perform keyword and phrase search. ### Status As of 5/30/2016: **COMPLETED** ### Modules: ``` edu.uci.ics.texera.dataflow.common edu.uci.ics.texera.dataflow.keywordmatch ``` ### Related Issues: https://github.com/Texera/texera/issues/31 ### Description Keyword Operator performs *Keyword Search* and *Phrase Search*. It implements an iterator-based design, and the `getNextTuple()` function should be used to get the next result. **Keyword Search**: It take as Keyword Predicate as the input with a query type as `KeywordOperator.BASIC`. It uses `IndexBasedScanOperator`, which returns a superset of the desired results. It then filters these results and updates the `Span` information accordingly. **Phrase Search**: It takes as Keyword Predicate as the input with a query type as `KeyWordOperator.PHRASE`. It uses `IndexBasedScanOperator`. Using the results and `Span` information for the `IndexBasedScanOperator`, it extracts the exact text from the document and updates the `Span` information accordingly. ### Performance Test Machine configuration : MacBook Pro (Early 2011), 2.3 GHz Intel Core i5, 4 GB 1333 MHz DDR3 * Dataset: 100k medline record * Performance results for KeywordMatcher with KeywordOperatorType.BASIC : Index time: 59.8610 seconds. * Query : "medicine" Lucene Query time: 1.3160 seconds. Match time: 10.7240 seconds. Total: 539 results. * Query : "medicine history" Lucene Query time: 1.8380 seconds Match time: 0.4580 seconds Total: 23 results * Dataset: 1million medline record * Performance results for KeywordMatcher with KeywordOperatorType.BASIC : index time: 655.8610 seconds * Query : "medicine" Lucene Query time: 4.8050 seconds Match time: 7192.9650 seconds Total: 9114 results * Query : "medicine history" Lucene Query time: 6.8490 seconds Match time: 18.4780 seconds Total: 514 results GitHub link: https://github.com/apache/texera/discussions/3975 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
