Can we meet and discuss? How about tomorrow 11am?
On Thu, Aug 28, 2014 at 6:49 PM, Malithi Edirisinghe <[email protected]> wrote: > Hi, > > I have looked at how Stanford NLP extract grammatical dependencies in > detail and have following concerns with regard to the implementation of 3rd > query(findRelationship(sentence, regex)). > > When a sentence is given Stanford NLP can recognise around 50 grammatical > relationships. I have listed some with simple examples below. > > > - acomp:adjective complement > > This is an adjectival phrase which functions as the complement (like an > object of the verb). > > ex: > > “She looks very beautiful” -> acomp(looks, beautiful) > > > - agent > > This is a complement of a passive verb which is introduced by the > preposition “by” and does the action. > > ex: > > “The man has been killed by the police” -> agent(killed, police) > “Effects caused by the protein are important” -> agent(caused, protein) > > > - aux:auxiliary > > This is the non-main verb of the clause > > ex: > > "Reagan has died" -> aux(died, has) > "He should leave" -> aux(leave,should) > > > - conj:conjunct > > This is the relation between two elements connected by a coordinating > conjunction, such as “and”, “or”, etc. > > ex: > > “Bill is big and honest” -> conj(big, honest) > “They either ski or snowboard” -> conj(ski, snowboard) > > > - dobj:direct object > > This is the noun phrase which is the object of the verb. > > ex: > > “They win the lottery” -> dobj(win, lottery) > > > - nsubj:nominal subject > > This is a noun phrase which is the syntactic subject of a clause. > > ex: > “The baby is cute” -> nsubj(cute, baby) > > With this library support, I would like to clarify on following. > > 1. How should we use the regular expression to extract the > relationship while the library is extracting relationships itself? > 2. What kind of relationships should we extract, for an example is it > just simple relationships as identifying the subject, verb and object or > any other? > > > Kindly expect your thoughts on this. > > Thanks, > Malithi. > > > > On Fri, Aug 22, 2014 at 6:11 PM, Malithi Edirisinghe <[email protected]> > wrote: > >> Hi, >> >> We started the implementation with Stanford NLP due to reasons below. >> >> 1. Stanford NLP provides a rich regular expression support in writing >> patterns over tokens, rather than working at character level with normal >> java regular expressions. >> >> 2. Stanford NLP can extract grammatical relationships from the parsed >> tree thus we can easily implement the 3rd query. >> >> Thanks, >> >> Malithi. >> >> >> On Thu, Aug 21, 2014 at 12:58 PM, Malithi Edirisinghe <[email protected]> >> wrote: >> >>> Hi Suho, >>> >>> Since Named Entity Recognition is supported by both libraries we can >>> implement the first function from any of them. Both can identify entities >>> like person, location, organization, etc. For the fourth function we found >>> a way that we can simply define dictionaries in openNLP. There is a class >>> called DictionaryNameFinder which takes a Dictionary and identify any >>> matching entry in the sentence with the dictionary. In Stanford NLP, we >>> could find that there is an implementation for a Dictionary; but yet we >>> couldn't find a way of using >>> that for our requirement. It lacks samples, and seems like we should >>> look into their code to find how they have used it. We will work on it. >>> Anyhow I think it should be possible to define such Dictionary in Stanford >>> NLP also. >>> >>> Thanks, >>> Malithi. >>> >>> >>> On Thu, Aug 21, 2014 at 10:09 AM, Sriskandarajah Suhothayan < >>> [email protected]> wrote: >>> >>>> Thats a good compression. >>>> Based on this I believe we have issues in implementing functions 2 & 3 >>>> using OpenNLP. >>>> Can you evaluate others functions as well. >>>> >>>> Suho >>>> >>>> >>>> On Thu, Aug 21, 2014 at 9:54 AM, Chanuka Dissanayake <[email protected]> >>>> wrote: >>>> >>>>> We did a study on both OpenNLP and Stanford NLP libraries and looked >>>>> at the features that could support our implementation. >>>>> Our findings are summarised below. >>>>> >>>>> It seems that Stanford NLP has better capabilities when considering >>>>> support for regular expressons and parsing. >>>>> We would like to discuss this further and choose the appropriate >>>>> >>>>> >>>>> Feature OpenNLP StanfordNLP Named Entity Recognizer Will identify >>>>> the person,location,organization,time,date,money,percentage inside the >>>>> given sentence but sentence need to be tokenized first. Includes a 4 >>>>> class model trained for CoNLL, a 7 class model trained for MUC, and a 3 >>>>> class model trained on both data sets for the intersection of those class >>>>> sets. >>>>> 3 class: Location, Person, Organization >>>>> 4 class: Location, Person, Organization, Misc >>>>> 7 class: Time, Location, Organization, Person, Money, Percent, Date >>>>> POS Tagger Identify: >>>>> VP(Verb Phrase) ,NP(Noun Phrase) ,JJ(Adjective)…etc >>>>> >>>>> Input: Hi. How are you? This is Mike >>>>> output: Hi_NNP How_WRB are_VBP you? _JJ This_DT is_VBZ Mike._NNP Label >>>>> each token with the POS Tag, such as noun, verb, adjective, etc., >>>>> Tokenizing Separates the words which have white spaces in-between by >>>>> default. Otherwise it can be trained to tokanize by different options. Can >>>>> tokenize the text either by whitespace or as per the options defined >>>>> Parsing Once given a tokanized sentence, It will construct the tree >>>>> structure. This works out the grammatical structure of sentences in a >>>>> tree structure. The parser provides Stanford Dependencies as well. They >>>>> represent the grammatical relations between words in a sentence. >>>>> Dependecies are triplets: name of the relation, governor and dependent. >>>>> Ex: Bell, based in Los Angeles, makes and distributes electronic, >>>>> computer and building products. >>>>> Dependency: nsubj(distributes-10, Bell-1) >>>>> This is like saying “the subject of distributes is Bell.” Sentence >>>>> Detection Detect sentence boundaries given a paragraph. Available as >>>>> ssplit. Can split sentences as per the options defined Regular >>>>> Expressions Character wise regular expression only. Cannot identify >>>>> named entities or PoS tags via regular expression Two tools are >>>>> provided to deal with regular expressions. >>>>> RegexNER:Can define simple rules with regular expressions and label >>>>> entities with NE labels that are not provided. >>>>> Ex: Bachelor of (Arts|Laws|Science|Engineering) DEGREE >>>>> This rule will label tokens matching with the regex in first column as >>>>> DEGREE >>>>> TokensRegex: Can identify patterns over a list of tokens. In addition >>>>> to java regex matching this provides syntax to match part of speech tags, >>>>> named entity tags and lemma. >>>>> Ex: [ { tag:VBD } ], /University/ /of/ [{ ner:LOCATION }] >>>>> >>>>> >>>>> Thanks, >>>>> Chanuka. >>>>> >>>>> >>>>> On Tue, Aug 19, 2014 at 11:11 PM, Sriskandarajah Suhothayan < >>>>> [email protected]> wrote: >>>>> >>>>>> +1 looks good >>>>>> >>>>>> Suho >>>>>> >>>>>> >>>>>> On Tue, Aug 19, 2014 at 9:56 PM, Srinath Perera <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Look good. If possible we should do this with OpenNLP as it has >>>>>>> apache licence. However, I could not find NLP regex impl there. Please >>>>>>> look >>>>>>> at it in detial. >>>>>>> >>>>>>> --Srinath >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 19, 2014 at 9:52 PM, Malithi Edirisinghe < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> We are working on a NLP Toolbox improvement in CEP. The main idea >>>>>>>> of this improvement is to use a NLP library and let user do some NLP >>>>>>>> operations as Siddhi extensions. >>>>>>>> >>>>>>>> So in our implementation we have decided to support following NLP >>>>>>>> operations. >>>>>>>> >>>>>>>> *1. findNameEntityType(sentence, entityType)* >>>>>>>> >>>>>>>> *Description:* >>>>>>>> >>>>>>>> This operation takes a sentence and a predefined entity type as >>>>>>>> it's inputs. It will return noun(s) in the sentence that match the >>>>>>>> defined >>>>>>>> entity type, as event(s). >>>>>>>> >>>>>>>> *inputs:* >>>>>>>> >>>>>>>> sentence : sentence to be processed >>>>>>>> entityType: predefined entity type >>>>>>>> ORGANIZATION >>>>>>>> NAME >>>>>>>> LOCATION >>>>>>>> *output:* >>>>>>>> >>>>>>>> matching noun(s) as event(s) >>>>>>>> >>>>>>>> *example:* >>>>>>>> >>>>>>>> inputs: >>>>>>>> sentence : Alice works at WSO2 >>>>>>>> entityType : NAME >>>>>>>> >>>>>>>> output: Alice >>>>>>>> >>>>>>>> *2. findNLRegexPattern(sentence, regex)* >>>>>>>> >>>>>>>> *Description:* >>>>>>>> >>>>>>>> This operation takes a sentence and a regular expression as it's >>>>>>>> inputs. It will return each match in the sentence, as an event. >>>>>>>> >>>>>>>> *inputs:* >>>>>>>> >>>>>>>> sentence : sentence to be processed >>>>>>>> regex : regular expression to be matched >>>>>>>> *output:* >>>>>>>> >>>>>>>> matching pharase(s) as event(s) >>>>>>>> >>>>>>>> *example:* >>>>>>>> >>>>>>>> inputs: >>>>>>>> sentence : WSO2 was found in 2005 >>>>>>>> regex : \\d{4} >>>>>>>> >>>>>>>> output: 2005 >>>>>>>> >>>>>>>> *3. findRelationship(sentence, regex)* >>>>>>>> >>>>>>>> *Description:* >>>>>>>> >>>>>>>> This operation takes a sentence and a regular expression as it's >>>>>>>> inputs. For each relationship extracted from the regular expression the >>>>>>>> operation will return a triplet; subject, object and relationship as an >>>>>>>> event. >>>>>>>> >>>>>>>> *inputs:* >>>>>>>> >>>>>>>> sentence : sentence to be processed >>>>>>>> regex : regular expression to extract the relationship >>>>>>>> *output:* >>>>>>>> >>>>>>>> triplet(s) of (subject, object, relationship) as event(s) >>>>>>>> >>>>>>>> *example:* >>>>>>>> >>>>>>>> inputs: >>>>>>>> sentence : Bob works for WSO2 >>>>>>>> regex : works for >>>>>>>> >>>>>>>> output: (Bob, WSO2, works for) >>>>>>>> *4. findNameEntityTypeViaDictionary(sentence, dictionary, >>>>>>>> entityType)* >>>>>>>> >>>>>>>> *Description:* >>>>>>>> >>>>>>>> This operation takes a sentence, dictionary file and a predefined >>>>>>>> entity type as it's inputs. It will return noun(s) in the sentence of >>>>>>>> the >>>>>>>> defined entity type, that also exists in the dictionary as event(s). >>>>>>>> >>>>>>>> *inputs:* >>>>>>>> >>>>>>>> sentence : sentence to be processed >>>>>>>> dictionary : dictionary of entities of the defined entity type >>>>>>>> entityType : predefined entity type >>>>>>>> ORGANIZATION >>>>>>>> NAME >>>>>>>> LOCATION >>>>>>>> *output:* >>>>>>>> >>>>>>>> matching noun(s) as event(s) >>>>>>>> >>>>>>>> *example:* >>>>>>>> >>>>>>>> inputs: >>>>>>>> sentence : Bob works at WSO2 >>>>>>>> dictionary : (WSO2,ORACLE,IBM) >>>>>>>> entityType : ORGANIZATION >>>>>>>> >>>>>>>> output: WSO2 >>>>>>>> >>>>>>>> Each NLP operation defined here will be implemented as a >>>>>>>> transformer extension to Siddhi. >>>>>>>> -- >>>>>>>> >>>>>>>> *Malithi Edirisinghe* >>>>>>>> Senior Software Engineer >>>>>>>> WSO2 Inc. >>>>>>>> >>>>>>>> Mobile : +94 (0) 718176807 >>>>>>>> [email protected] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ============================ >>>>>>> Director, Research, WSO2 Inc. >>>>>>> Visiting Faculty, University of Moratuwa >>>>>>> Member, Apache Software Foundation >>>>>>> Research Scientist, Lanka Software Foundation >>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>> Phone: 0772360902 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> *S. Suhothayan* >>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>>>> *WSO2 Inc. *http://wso2.com >>>>>> * <http://wso2.com/>* >>>>>> lean . enterprise . middleware >>>>>> >>>>>> >>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> >>>>>> twitter: >>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | >>>>>> linked-in: >>>>>> http://lk.linkedin.com/in/suhothayan >>>>>> <http://lk.linkedin.com/in/suhothayan>* >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Chanuka Dissanayake >>>>> *Software Engineer | **WSO2 Inc.*; http://wso2.com >>>>> >>>>> Mobile: +94 71 33 63 596 >>>>> Email: [email protected] >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *S. Suhothayan* >>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>> *WSO2 Inc. *http://wso2.com >>>> * <http://wso2.com/>* >>>> lean . enterprise . middleware >>>> >>>> >>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> twitter: >>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >>>> http://lk.linkedin.com/in/suhothayan >>>> <http://lk.linkedin.com/in/suhothayan>* >>>> >>> >>> >>> >>> -- >>> >>> *Malithi Edirisinghe* >>> Senior Software Engineer >>> WSO2 Inc. >>> >>> Mobile : +94 (0) 718176807 >>> [email protected] >>> >> >> >> >> -- >> >> *Malithi Edirisinghe* >> Senior Software Engineer >> WSO2 Inc. >> >> Mobile : +94 (0) 718176807 >> [email protected] >> > > > > -- > > *Malithi Edirisinghe* > Senior Software Engineer > WSO2 Inc. > > Mobile : +94 (0) 718176807 > [email protected] > -- ============================ Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
