Yes, sure. Thanks.
On Mon, Sep 1, 2014 at 2:42 PM, Srinath Perera <[email protected]> wrote: > How about 2pm? (Someone had a conflict in the AM) > > > On Mon, Sep 1, 2014 at 2:40 PM, Srinath Perera <[email protected]> wrote: > >> Can we meet and discuss? How about tomorrow 11am? >> >> >> On Thu, Aug 28, 2014 at 6:49 PM, Malithi Edirisinghe <[email protected]> >> wrote: >> >>> Hi, >>> >>> I have looked at how Stanford NLP extract grammatical dependencies in >>> detail and have following concerns with regard to the implementation of 3rd >>> query(findRelationship(sentence, regex)). >>> >>> When a sentence is given Stanford NLP can recognise around 50 >>> grammatical relationships. I have listed some with simple examples below. >>> >>> >>> - acomp:adjective complement >>> >>> This is an adjectival phrase which functions as the complement (like an >>> object of the verb). >>> >>> ex: >>> >>> “She looks very beautiful” -> acomp(looks, beautiful) >>> >>> >>> - agent >>> >>> This is a complement of a passive verb which is introduced by the >>> preposition “by” and does the action. >>> >>> ex: >>> >>> “The man has been killed by the police” -> agent(killed, police) >>> “Effects caused by the protein are important” -> agent(caused, protein) >>> >>> >>> - aux:auxiliary >>> >>> This is the non-main verb of the clause >>> >>> ex: >>> >>> "Reagan has died" -> aux(died, has) >>> "He should leave" -> aux(leave,should) >>> >>> >>> - conj:conjunct >>> >>> This is the relation between two elements connected by a coordinating >>> conjunction, such as “and”, “or”, etc. >>> >>> ex: >>> >>> “Bill is big and honest” -> conj(big, honest) >>> “They either ski or snowboard” -> conj(ski, snowboard) >>> >>> >>> - dobj:direct object >>> >>> This is the noun phrase which is the object of the verb. >>> >>> ex: >>> >>> “They win the lottery” -> dobj(win, lottery) >>> >>> >>> - nsubj:nominal subject >>> >>> This is a noun phrase which is the syntactic subject of a clause. >>> >>> ex: >>> “The baby is cute” -> nsubj(cute, baby) >>> >>> With this library support, I would like to clarify on following. >>> >>> 1. How should we use the regular expression to extract the >>> relationship while the library is extracting relationships itself? >>> 2. What kind of relationships should we extract, for an example is >>> it just simple relationships as identifying the subject, verb and object >>> or >>> any other? >>> >>> >>> Kindly expect your thoughts on this. >>> >>> Thanks, >>> Malithi. >>> >>> >>> >>> On Fri, Aug 22, 2014 at 6:11 PM, Malithi Edirisinghe <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> We started the implementation with Stanford NLP due to reasons below. >>>> >>>> 1. Stanford NLP provides a rich regular expression support in writing >>>> patterns over tokens, rather than working at character level with normal >>>> java regular expressions. >>>> >>>> 2. Stanford NLP can extract grammatical relationships from the parsed >>>> tree thus we can easily implement the 3rd query. >>>> >>>> Thanks, >>>> >>>> Malithi. >>>> >>>> >>>> On Thu, Aug 21, 2014 at 12:58 PM, Malithi Edirisinghe < >>>> [email protected]> wrote: >>>> >>>>> Hi Suho, >>>>> >>>>> Since Named Entity Recognition is supported by both libraries we can >>>>> implement the first function from any of them. Both can identify entities >>>>> like person, location, organization, etc. For the fourth function we found >>>>> a way that we can simply define dictionaries in openNLP. There is a class >>>>> called DictionaryNameFinder which takes a Dictionary and identify any >>>>> matching entry in the sentence with the dictionary. In Stanford NLP, we >>>>> could find that there is an implementation for a Dictionary; but yet we >>>>> couldn't find a way of using >>>>> that for our requirement. It lacks samples, and seems like we should >>>>> look into their code to find how they have used it. We will work on it. >>>>> Anyhow I think it should be possible to define such Dictionary in Stanford >>>>> NLP also. >>>>> >>>>> Thanks, >>>>> Malithi. >>>>> >>>>> >>>>> On Thu, Aug 21, 2014 at 10:09 AM, Sriskandarajah Suhothayan < >>>>> [email protected]> wrote: >>>>> >>>>>> Thats a good compression. >>>>>> Based on this I believe we have issues in implementing functions 2 & >>>>>> 3 using OpenNLP. >>>>>> Can you evaluate others functions as well. >>>>>> >>>>>> Suho >>>>>> >>>>>> >>>>>> On Thu, Aug 21, 2014 at 9:54 AM, Chanuka Dissanayake < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> We did a study on both OpenNLP and Stanford NLP libraries and looked >>>>>>> at the features that could support our implementation. >>>>>>> Our findings are summarised below. >>>>>>> >>>>>>> It seems that Stanford NLP has better capabilities when considering >>>>>>> support for regular expressons and parsing. >>>>>>> We would like to discuss this further and choose the appropriate >>>>>>> >>>>>>> >>>>>>> Feature OpenNLP StanfordNLP Named Entity Recognizer Will >>>>>>> identify the person,location,organization,time,date,money,percentage >>>>>>> inside >>>>>>> the given sentence but sentence need to be tokenized first. Includes >>>>>>> a 4 class model trained for CoNLL, a 7 class model trained for MUC, and >>>>>>> a 3 >>>>>>> class model trained on both data sets for the intersection of those >>>>>>> class >>>>>>> sets. >>>>>>> 3 class: Location, Person, Organization >>>>>>> 4 class: Location, Person, Organization, Misc >>>>>>> 7 class: Time, Location, Organization, Person, Money, Percent, Date >>>>>>> POS Tagger Identify: >>>>>>> VP(Verb Phrase) ,NP(Noun Phrase) ,JJ(Adjective)…etc >>>>>>> >>>>>>> Input: Hi. How are you? This is Mike >>>>>>> output: Hi_NNP How_WRB are_VBP you? _JJ This_DT is_VBZ Mike._NNP Label >>>>>>> each token with the POS Tag, such as noun, verb, adjective, etc., >>>>>>> Tokenizing Separates the words which have white spaces in-between >>>>>>> by default. Otherwise it can be trained to tokanize by different >>>>>>> options. Can >>>>>>> tokenize the text either by whitespace or as per the options defined >>>>>>> Parsing Once given a tokanized sentence, It will construct the tree >>>>>>> structure. This works out the grammatical structure of sentences in >>>>>>> a tree structure. The parser provides Stanford Dependencies as well. >>>>>>> They >>>>>>> represent the grammatical relations between words in a sentence. >>>>>>> Dependecies are triplets: name of the relation, governor and dependent. >>>>>>> Ex: Bell, based in Los Angeles, makes and distributes electronic, >>>>>>> computer and building products. >>>>>>> Dependency: nsubj(distributes-10, Bell-1) >>>>>>> This is like saying “the subject of distributes is Bell.” Sentence >>>>>>> Detection Detect sentence boundaries given a paragraph. Available >>>>>>> as ssplit. Can split sentences as per the options defined Regular >>>>>>> Expressions Character wise regular expression only. Cannot identify >>>>>>> named entities or PoS tags via regular expression Two tools are >>>>>>> provided to deal with regular expressions. >>>>>>> RegexNER:Can define simple rules with regular expressions and label >>>>>>> entities with NE labels that are not provided. >>>>>>> Ex: Bachelor of (Arts|Laws|Science|Engineering) DEGREE >>>>>>> This rule will label tokens matching with the regex in first column >>>>>>> as DEGREE >>>>>>> TokensRegex: Can identify patterns over a list of tokens. In >>>>>>> addition to java regex matching this provides syntax to match part of >>>>>>> speech tags, named entity tags and lemma. >>>>>>> Ex: [ { tag:VBD } ], /University/ /of/ [{ ner:LOCATION }] >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Chanuka. >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 19, 2014 at 11:11 PM, Sriskandarajah Suhothayan < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> +1 looks good >>>>>>>> >>>>>>>> Suho >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 19, 2014 at 9:56 PM, Srinath Perera <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Look good. If possible we should do this with OpenNLP as it has >>>>>>>>> apache licence. However, I could not find NLP regex impl there. >>>>>>>>> Please look >>>>>>>>> at it in detial. >>>>>>>>> >>>>>>>>> --Srinath >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 19, 2014 at 9:52 PM, Malithi Edirisinghe < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> We are working on a NLP Toolbox improvement in CEP. The main idea >>>>>>>>>> of this improvement is to use a NLP library and let user do some NLP >>>>>>>>>> operations as Siddhi extensions. >>>>>>>>>> >>>>>>>>>> So in our implementation we have decided to support following NLP >>>>>>>>>> operations. >>>>>>>>>> >>>>>>>>>> *1. findNameEntityType(sentence, entityType)* >>>>>>>>>> >>>>>>>>>> *Description:* >>>>>>>>>> >>>>>>>>>> This operation takes a sentence and a predefined entity type as >>>>>>>>>> it's inputs. It will return noun(s) in the sentence that match the >>>>>>>>>> defined >>>>>>>>>> entity type, as event(s). >>>>>>>>>> >>>>>>>>>> *inputs:* >>>>>>>>>> >>>>>>>>>> sentence : sentence to be processed >>>>>>>>>> entityType: predefined entity type >>>>>>>>>> ORGANIZATION >>>>>>>>>> NAME >>>>>>>>>> LOCATION >>>>>>>>>> *output:* >>>>>>>>>> >>>>>>>>>> matching noun(s) as event(s) >>>>>>>>>> >>>>>>>>>> *example:* >>>>>>>>>> >>>>>>>>>> inputs: >>>>>>>>>> sentence : Alice works at WSO2 >>>>>>>>>> entityType : NAME >>>>>>>>>> >>>>>>>>>> output: Alice >>>>>>>>>> >>>>>>>>>> *2. findNLRegexPattern(sentence, regex)* >>>>>>>>>> >>>>>>>>>> *Description:* >>>>>>>>>> >>>>>>>>>> This operation takes a sentence and a regular expression as it's >>>>>>>>>> inputs. It will return each match in the sentence, as an event. >>>>>>>>>> >>>>>>>>>> *inputs:* >>>>>>>>>> >>>>>>>>>> sentence : sentence to be processed >>>>>>>>>> regex : regular expression to be matched >>>>>>>>>> *output:* >>>>>>>>>> >>>>>>>>>> matching pharase(s) as event(s) >>>>>>>>>> >>>>>>>>>> *example:* >>>>>>>>>> >>>>>>>>>> inputs: >>>>>>>>>> sentence : WSO2 was found in 2005 >>>>>>>>>> regex : \\d{4} >>>>>>>>>> >>>>>>>>>> output: 2005 >>>>>>>>>> >>>>>>>>>> *3. findRelationship(sentence, regex)* >>>>>>>>>> >>>>>>>>>> *Description:* >>>>>>>>>> >>>>>>>>>> This operation takes a sentence and a regular expression as it's >>>>>>>>>> inputs. For each relationship extracted from the regular expression >>>>>>>>>> the >>>>>>>>>> operation will return a triplet; subject, object and relationship as >>>>>>>>>> an >>>>>>>>>> event. >>>>>>>>>> >>>>>>>>>> *inputs:* >>>>>>>>>> >>>>>>>>>> sentence : sentence to be processed >>>>>>>>>> regex : regular expression to extract the relationship >>>>>>>>>> *output:* >>>>>>>>>> >>>>>>>>>> triplet(s) of (subject, object, relationship) as event(s) >>>>>>>>>> >>>>>>>>>> *example:* >>>>>>>>>> >>>>>>>>>> inputs: >>>>>>>>>> sentence : Bob works for WSO2 >>>>>>>>>> regex : works for >>>>>>>>>> >>>>>>>>>> output: (Bob, WSO2, works for) >>>>>>>>>> *4. findNameEntityTypeViaDictionary(sentence, dictionary, >>>>>>>>>> entityType)* >>>>>>>>>> >>>>>>>>>> *Description:* >>>>>>>>>> >>>>>>>>>> This operation takes a sentence, dictionary file and a predefined >>>>>>>>>> entity type as it's inputs. It will return noun(s) in the sentence >>>>>>>>>> of the >>>>>>>>>> defined entity type, that also exists in the dictionary as event(s). >>>>>>>>>> >>>>>>>>>> *inputs:* >>>>>>>>>> >>>>>>>>>> sentence : sentence to be processed >>>>>>>>>> dictionary : dictionary of entities of the defined entity type >>>>>>>>>> entityType : predefined entity type >>>>>>>>>> ORGANIZATION >>>>>>>>>> NAME >>>>>>>>>> LOCATION >>>>>>>>>> *output:* >>>>>>>>>> >>>>>>>>>> matching noun(s) as event(s) >>>>>>>>>> >>>>>>>>>> *example:* >>>>>>>>>> >>>>>>>>>> inputs: >>>>>>>>>> sentence : Bob works at WSO2 >>>>>>>>>> dictionary : (WSO2,ORACLE,IBM) >>>>>>>>>> entityType : ORGANIZATION >>>>>>>>>> >>>>>>>>>> output: WSO2 >>>>>>>>>> >>>>>>>>>> Each NLP operation defined here will be implemented as a >>>>>>>>>> transformer extension to Siddhi. >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> *Malithi Edirisinghe* >>>>>>>>>> Senior Software Engineer >>>>>>>>>> WSO2 Inc. >>>>>>>>>> >>>>>>>>>> Mobile : +94 (0) 718176807 >>>>>>>>>> [email protected] >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ============================ >>>>>>>>> Director, Research, WSO2 Inc. >>>>>>>>> Visiting Faculty, University of Moratuwa >>>>>>>>> Member, Apache Software Foundation >>>>>>>>> Research Scientist, Lanka Software Foundation >>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>>> Phone: 0772360902 >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> *S. Suhothayan* >>>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>>>>>> *WSO2 Inc. *http://wso2.com >>>>>>>> * <http://wso2.com/>* >>>>>>>> lean . enterprise . middleware >>>>>>>> >>>>>>>> >>>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> >>>>>>>> twitter: >>>>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | >>>>>>>> linked-in: >>>>>>>> http://lk.linkedin.com/in/suhothayan >>>>>>>> <http://lk.linkedin.com/in/suhothayan>* >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Chanuka Dissanayake >>>>>>> *Software Engineer | **WSO2 Inc.*; http://wso2.com >>>>>>> >>>>>>> Mobile: +94 71 33 63 596 >>>>>>> Email: [email protected] >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> *S. Suhothayan* >>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>>>> *WSO2 Inc. *http://wso2.com >>>>>> * <http://wso2.com/>* >>>>>> lean . enterprise . middleware >>>>>> >>>>>> >>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> >>>>>> twitter: >>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | >>>>>> linked-in: >>>>>> http://lk.linkedin.com/in/suhothayan >>>>>> <http://lk.linkedin.com/in/suhothayan>* >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> *Malithi Edirisinghe* >>>>> Senior Software Engineer >>>>> WSO2 Inc. >>>>> >>>>> Mobile : +94 (0) 718176807 >>>>> [email protected] >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *Malithi Edirisinghe* >>>> Senior Software Engineer >>>> WSO2 Inc. >>>> >>>> Mobile : +94 (0) 718176807 >>>> [email protected] >>>> >>> >>> >>> >>> -- >>> >>> *Malithi Edirisinghe* >>> Senior Software Engineer >>> WSO2 Inc. >>> >>> Mobile : +94 (0) 718176807 >>> [email protected] >>> >> >> >> >> -- >> ============================ >> Director, Research, WSO2 Inc. >> Visiting Faculty, University of Moratuwa >> Member, Apache Software Foundation >> Research Scientist, Lanka Software Foundation >> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >> Site: http://people.apache.org/~hemapani/ >> Photos: http://www.flickr.com/photos/hemapani/ >> Phone: 0772360902 >> > > > > -- > ============================ > Director, Research, WSO2 Inc. > Visiting Faculty, University of Moratuwa > Member, Apache Software Foundation > Research Scientist, Lanka Software Foundation > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- Chanuka Dissanayake *Software Engineer | **WSO2 Inc.*; http://wso2.com Mobile: +94 71 33 63 596 Email: [email protected]
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
