GitHub user chenlica created a discussion: Sample Zika Extraction (from old 
wiki)

>From the page https://github.com/apache/texera/wiki/Sample-Zika-Extraction 
>(may be dangling)

====

For all the operators, leave limit and offset empty  

1. create KeywordSource with properties:  
keyword: zika  
data source: promed  
matching type: conjunction (default)  
attribute: content  

2. create Projection  
attributes: _id, webpage, content  

3. connect KeywordSource with Projection  

4. create Regex_Person  
regex:   
(A|a|(an)|(An)) .{1,40} ((woman)|(man))  
attribute: content  

5. connect Projection with Regex_Person  

6. create NLP_Location  
type: location  
attribute: content  

7. connect Projection with NLP_Location   

8. create Regex_Date  
regex:   
(((0?[1-9])|(1[0-2]))(\s|-|.|\/)((0?[1-9])|([12][0-9])|(3[01]))(\s|-|.|\/)([0-9]{4}|[0-9]{2}))|((0?[1-9])|([12][0-9])|(3[01]))
 
((jan(uary)?)|(feb(ruary)?)|(mar(ch)?)|(apr(il)?)|(may)|(june?)|(july?)|(aug(ust)?)|(sep(tember)?)|(oct(ober)?)|(nov(ember)?)|(dec(ember)?))
  
attribute: content  

9. connect Projection with Regex_Date  

10. create Join1  
Join attribute: content  
id attribute: _id (default)  
PredicateType: CharacterDistance (default)  
distance: 100   

11. connect Regex_Person and NLP_Location with Join1  

12. create Join2  
(same properties as Join1)  

13. Connect Join1 and Regex_Date with Join2  

14. Create TupleStreamSink (view results)  

15. connect Join2 with TupleStreamSinkFor all the operators, leave limit and 
offset empty  


Here's a screenshot of the query plan: 
![](https://cloud.githubusercontent.com/assets/12578068/22418680/e257b9b8-e68e-11e6-8139-0f491c35c92c.png)

GitHub link: https://github.com/apache/texera/discussions/3984

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

Reply via email to