Hi Suho, Since Named Entity Recognition is supported by both libraries we can implement the first function from any of them. Both can identify entities like person, location, organization, etc. For the fourth function we found a way that we can simply define dictionaries in openNLP. There is a class called DictionaryNameFinder which takes a Dictionary and identify any matching entry in the sentence with the dictionary. In Stanford NLP, we could find that there is an implementation for a Dictionary; but yet we couldn't find a way of using that for our requirement. It lacks samples, and seems like we should look into their code to find how they have used it. We will work on it. Anyhow I think it should be possible to define such Dictionary in Stanford NLP also.
Thanks, Malithi. On Thu, Aug 21, 2014 at 10:09 AM, Sriskandarajah Suhothayan <[email protected]> wrote: > Thats a good compression. > Based on this I believe we have issues in implementing functions 2 & 3 > using OpenNLP. > Can you evaluate others functions as well. > > Suho > > > On Thu, Aug 21, 2014 at 9:54 AM, Chanuka Dissanayake <[email protected]> > wrote: > >> We did a study on both OpenNLP and Stanford NLP libraries and looked at >> the features that could support our implementation. >> Our findings are summarised below. >> >> It seems that Stanford NLP has better capabilities when considering >> support for regular expressons and parsing. >> We would like to discuss this further and choose the appropriate >> >> >> Feature OpenNLP StanfordNLP Named Entity Recognizer Will identify >> the person,location,organization,time,date,money,percentage inside the >> given sentence but sentence need to be tokenized first. Includes a 4 >> class model trained for CoNLL, a 7 class model trained for MUC, and a 3 >> class model trained on both data sets for the intersection of those class >> sets. >> 3 class: Location, Person, Organization >> 4 class: Location, Person, Organization, Misc >> 7 class: Time, Location, Organization, Person, Money, Percent, Date >> POS Tagger Identify: >> VP(Verb Phrase) ,NP(Noun Phrase) ,JJ(Adjective)…etc >> >> Input: Hi. How are you? This is Mike >> output: Hi_NNP How_WRB are_VBP you? _JJ This_DT is_VBZ Mike._NNP Label >> each token with the POS Tag, such as noun, verb, adjective, etc., >> Tokenizing Separates the words which have white spaces in-between by >> default. Otherwise it can be trained to tokanize by different options. Can >> tokenize the text either by whitespace or as per the options defined >> Parsing Once given a tokanized sentence, It will construct the tree >> structure. This works out the grammatical structure of sentences in a >> tree structure. The parser provides Stanford Dependencies as well. They >> represent the grammatical relations between words in a sentence. >> Dependecies are triplets: name of the relation, governor and dependent. >> Ex: Bell, based in Los Angeles, makes and distributes electronic, >> computer and building products. >> Dependency: nsubj(distributes-10, Bell-1) >> This is like saying “the subject of distributes is Bell.” Sentence >> Detection Detect sentence boundaries given a paragraph. Available as >> ssplit. Can split sentences as per the options defined Regular >> Expressions Character wise regular expression only. Cannot identify >> named entities or PoS tags via regular expression Two tools are provided >> to deal with regular expressions. >> RegexNER:Can define simple rules with regular expressions and label >> entities with NE labels that are not provided. >> Ex: Bachelor of (Arts|Laws|Science|Engineering) DEGREE >> This rule will label tokens matching with the regex in first column as >> DEGREE >> TokensRegex: Can identify patterns over a list of tokens. In addition to >> java regex matching this provides syntax to match part of speech tags, >> named entity tags and lemma. >> Ex: [ { tag:VBD } ], /University/ /of/ [{ ner:LOCATION }] >> >> >> Thanks, >> Chanuka. >> >> >> On Tue, Aug 19, 2014 at 11:11 PM, Sriskandarajah Suhothayan < >> [email protected]> wrote: >> >>> +1 looks good >>> >>> Suho >>> >>> >>> On Tue, Aug 19, 2014 at 9:56 PM, Srinath Perera <[email protected]> >>> wrote: >>> >>>> Look good. If possible we should do this with OpenNLP as it has apache >>>> licence. However, I could not find NLP regex impl there. Please look at it >>>> in detial. >>>> >>>> --Srinath >>>> >>>> >>>> On Tue, Aug 19, 2014 at 9:52 PM, Malithi Edirisinghe <[email protected] >>>> > wrote: >>>> >>>>> >>>>> Hi All, >>>>> >>>>> We are working on a NLP Toolbox improvement in CEP. The main idea of >>>>> this improvement is to use a NLP library and let user do some NLP >>>>> operations as Siddhi extensions. >>>>> >>>>> So in our implementation we have decided to support following NLP >>>>> operations. >>>>> >>>>> *1. findNameEntityType(sentence, entityType)* >>>>> >>>>> *Description:* >>>>> >>>>> This operation takes a sentence and a predefined entity type as it's >>>>> inputs. It will return noun(s) in the sentence that match the defined >>>>> entity type, as event(s). >>>>> >>>>> *inputs:* >>>>> >>>>> sentence : sentence to be processed >>>>> entityType: predefined entity type >>>>> ORGANIZATION >>>>> NAME >>>>> LOCATION >>>>> *output:* >>>>> >>>>> matching noun(s) as event(s) >>>>> >>>>> *example:* >>>>> >>>>> inputs: >>>>> sentence : Alice works at WSO2 >>>>> entityType : NAME >>>>> >>>>> output: Alice >>>>> >>>>> *2. findNLRegexPattern(sentence, regex)* >>>>> >>>>> *Description:* >>>>> >>>>> This operation takes a sentence and a regular expression as it's >>>>> inputs. It will return each match in the sentence, as an event. >>>>> >>>>> *inputs:* >>>>> >>>>> sentence : sentence to be processed >>>>> regex : regular expression to be matched >>>>> *output:* >>>>> >>>>> matching pharase(s) as event(s) >>>>> >>>>> *example:* >>>>> >>>>> inputs: >>>>> sentence : WSO2 was found in 2005 >>>>> regex : \\d{4} >>>>> >>>>> output: 2005 >>>>> >>>>> *3. findRelationship(sentence, regex)* >>>>> >>>>> *Description:* >>>>> >>>>> This operation takes a sentence and a regular expression as it's >>>>> inputs. For each relationship extracted from the regular expression the >>>>> operation will return a triplet; subject, object and relationship as an >>>>> event. >>>>> >>>>> *inputs:* >>>>> >>>>> sentence : sentence to be processed >>>>> regex : regular expression to extract the relationship >>>>> *output:* >>>>> >>>>> triplet(s) of (subject, object, relationship) as event(s) >>>>> >>>>> *example:* >>>>> >>>>> inputs: >>>>> sentence : Bob works for WSO2 >>>>> regex : works for >>>>> >>>>> output: (Bob, WSO2, works for) >>>>> *4. findNameEntityTypeViaDictionary(sentence, dictionary, >>>>> entityType)* >>>>> >>>>> *Description:* >>>>> >>>>> This operation takes a sentence, dictionary file and a predefined >>>>> entity type as it's inputs. It will return noun(s) in the sentence of the >>>>> defined entity type, that also exists in the dictionary as event(s). >>>>> >>>>> *inputs:* >>>>> >>>>> sentence : sentence to be processed >>>>> dictionary : dictionary of entities of the defined entity type >>>>> entityType : predefined entity type >>>>> ORGANIZATION >>>>> NAME >>>>> LOCATION >>>>> *output:* >>>>> >>>>> matching noun(s) as event(s) >>>>> >>>>> *example:* >>>>> >>>>> inputs: >>>>> sentence : Bob works at WSO2 >>>>> dictionary : (WSO2,ORACLE,IBM) >>>>> entityType : ORGANIZATION >>>>> >>>>> output: WSO2 >>>>> >>>>> Each NLP operation defined here will be implemented as a transformer >>>>> extension to Siddhi. >>>>> -- >>>>> >>>>> *Malithi Edirisinghe* >>>>> Senior Software Engineer >>>>> WSO2 Inc. >>>>> >>>>> Mobile : +94 (0) 718176807 >>>>> [email protected] >>>>> >>>> >>>> >>>> >>>> -- >>>> ============================ >>>> Director, Research, WSO2 Inc. >>>> Visiting Faculty, University of Moratuwa >>>> Member, Apache Software Foundation >>>> Research Scientist, Lanka Software Foundation >>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>> Site: http://people.apache.org/~hemapani/ >>>> Photos: http://www.flickr.com/photos/hemapani/ >>>> Phone: 0772360902 >>>> >>> >>> >>> >>> -- >>> >>> *S. Suhothayan* >>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>> *WSO2 Inc. *http://wso2.com >>> * <http://wso2.com/>* >>> lean . enterprise . middleware >>> >>> >>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> twitter: >>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >>> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>* >>> >> >> >> >> -- >> Chanuka Dissanayake >> *Software Engineer | **WSO2 Inc.*; http://wso2.com >> >> Mobile: +94 71 33 63 596 >> Email: [email protected] >> > > > > -- > > *S. Suhothayan* > Technical Lead & Team Lead of WSO2 Complex Event Processor > *WSO2 Inc. *http://wso2.com > * <http://wso2.com/>* > lean . enterprise . middleware > > > *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: > http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/>twitter: > http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: > http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>* > -- *Malithi Edirisinghe* Senior Software Engineer WSO2 Inc. Mobile : +94 (0) 718176807 [email protected]
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
