GitHub user chenlica closed a discussion: Keyword Match Operator (from old wiki)

>From the page https://github.com/apache/texera/wiki/Keyword-Match-Operator 
>(may be dangling)

====

Authors: [Akshay Jain](https://github.com/akshaybetala), [Prakul 
Agarwal](https://github.com/prakul)

Reviewers: [Chen Li](https://github.com/chenlica)

### Synopsis

Implement an operator wrapping Lucene's search capability to support perform 
keyword and phrase search. 

### Status
As of 5/30/2016: **COMPLETED**

### Modules:
```
edu.uci.ics.texera.dataflow.common
edu.uci.ics.texera.dataflow.keywordmatch
```

### Related Issues:

https://github.com/Texera/texera/issues/31

### Description
Keyword Operator performs *Keyword Search* and *Phrase Search*. It implements 
an iterator-based design, and the `getNextTuple()` function should be used to 
get the next result.

**Keyword Search**:

It take as Keyword Predicate as the input with a query type as 
`KeywordOperator.BASIC`. It uses `IndexBasedScanOperator`, which returns a 
superset of the desired results. It then filters these results and updates the 
`Span` information accordingly.

**Phrase Search**:

It takes as Keyword Predicate as the input with a query type as 
`KeyWordOperator.PHRASE`. It uses `IndexBasedScanOperator`. Using the results 
and `Span` information for the `IndexBasedScanOperator`, it extracts the exact 
text from the document and updates the `Span` information accordingly.

### Performance Test
Machine configuration : MacBook Pro (Early 2011), 2.3 GHz Intel Core i5, 4 GB 
1333 MHz DDR3
* Dataset: 100k medline record
* Performance results for KeywordMatcher with KeywordOperatorType.BASIC : 

Index time: 59.8610 seconds.

* Query : "medicine"

Lucene Query time: 1.3160 seconds.

Match time: 10.7240 seconds.

Total: 539 results.

* Query : "medicine history"

Lucene Query time: 1.8380 seconds

Match time: 0.4580 seconds

Total: 23 results

* Dataset: 1million medline record
* Performance results for KeywordMatcher with KeywordOperatorType.BASIC : 

index time: 655.8610 seconds

* Query : "medicine"

Lucene Query time: 4.8050 seconds

Match time: 7192.9650 seconds

Total: 9114 results

* Query : "medicine history"

Lucene Query time: 6.8490 seconds

Match time: 18.4780 seconds

Total: 514 results

GitHub link: https://github.com/apache/texera/discussions/3975

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to