How about 2pm? (Someone had a conflict in the AM)
On Mon, Sep 1, 2014 at 2:40 PM, Srinath Perera <[email protected]> wrote: > Can we meet and discuss? How about tomorrow 11am? > > > On Thu, Aug 28, 2014 at 6:49 PM, Malithi Edirisinghe <[email protected]> > wrote: > >> Hi, >> >> I have looked at how Stanford NLP extract grammatical dependencies in >> detail and have following concerns with regard to the implementation of 3rd >> query(findRelationship(sentence, regex)). >> >> When a sentence is given Stanford NLP can recognise around 50 grammatical >> relationships. I have listed some with simple examples below. >> >> >> - acomp:adjective complement >> >> This is an adjectival phrase which functions as the complement (like an >> object of the verb). >> >> ex: >> >> “She looks very beautiful” -> acomp(looks, beautiful) >> >> >> - agent >> >> This is a complement of a passive verb which is introduced by the >> preposition “by” and does the action. >> >> ex: >> >> “The man has been killed by the police” -> agent(killed, police) >> “Effects caused by the protein are important” -> agent(caused, protein) >> >> >> - aux:auxiliary >> >> This is the non-main verb of the clause >> >> ex: >> >> "Reagan has died" -> aux(died, has) >> "He should leave" -> aux(leave,should) >> >> >> - conj:conjunct >> >> This is the relation between two elements connected by a coordinating >> conjunction, such as “and”, “or”, etc. >> >> ex: >> >> “Bill is big and honest” -> conj(big, honest) >> “They either ski or snowboard” -> conj(ski, snowboard) >> >> >> - dobj:direct object >> >> This is the noun phrase which is the object of the verb. >> >> ex: >> >> “They win the lottery” -> dobj(win, lottery) >> >> >> - nsubj:nominal subject >> >> This is a noun phrase which is the syntactic subject of a clause. >> >> ex: >> “The baby is cute” -> nsubj(cute, baby) >> >> With this library support, I would like to clarify on following. >> >> 1. How should we use the regular expression to extract the >> relationship while the library is extracting relationships itself? >> 2. What kind of relationships should we extract, for an example is it >> just simple relationships as identifying the subject, verb and object or >> any other? >> >> >> Kindly expect your thoughts on this. >> >> Thanks, >> Malithi. >> >> >> >> On Fri, Aug 22, 2014 at 6:11 PM, Malithi Edirisinghe <[email protected]> >> wrote: >> >>> Hi, >>> >>> We started the implementation with Stanford NLP due to reasons below. >>> >>> 1. Stanford NLP provides a rich regular expression support in writing >>> patterns over tokens, rather than working at character level with normal >>> java regular expressions. >>> >>> 2. Stanford NLP can extract grammatical relationships from the parsed >>> tree thus we can easily implement the 3rd query. >>> >>> Thanks, >>> >>> Malithi. >>> >>> >>> On Thu, Aug 21, 2014 at 12:58 PM, Malithi Edirisinghe <[email protected] >>> > wrote: >>> >>>> Hi Suho, >>>> >>>> Since Named Entity Recognition is supported by both libraries we can >>>> implement the first function from any of them. Both can identify entities >>>> like person, location, organization, etc. For the fourth function we found >>>> a way that we can simply define dictionaries in openNLP. There is a class >>>> called DictionaryNameFinder which takes a Dictionary and identify any >>>> matching entry in the sentence with the dictionary. In Stanford NLP, we >>>> could find that there is an implementation for a Dictionary; but yet we >>>> couldn't find a way of using >>>> that for our requirement. It lacks samples, and seems like we should >>>> look into their code to find how they have used it. We will work on it. >>>> Anyhow I think it should be possible to define such Dictionary in Stanford >>>> NLP also. >>>> >>>> Thanks, >>>> Malithi. >>>> >>>> >>>> On Thu, Aug 21, 2014 at 10:09 AM, Sriskandarajah Suhothayan < >>>> [email protected]> wrote: >>>> >>>>> Thats a good compression. >>>>> Based on this I believe we have issues in implementing functions 2 & 3 >>>>> using OpenNLP. >>>>> Can you evaluate others functions as well. >>>>> >>>>> Suho >>>>> >>>>> >>>>> On Thu, Aug 21, 2014 at 9:54 AM, Chanuka Dissanayake <[email protected] >>>>> > wrote: >>>>> >>>>>> We did a study on both OpenNLP and Stanford NLP libraries and looked >>>>>> at the features that could support our implementation. >>>>>> Our findings are summarised below. >>>>>> >>>>>> It seems that Stanford NLP has better capabilities when considering >>>>>> support for regular expressons and parsing. >>>>>> We would like to discuss this further and choose the appropriate >>>>>> >>>>>> >>>>>> Feature OpenNLP StanfordNLP Named Entity Recognizer Will >>>>>> identify the person,location,organization,time,date,money,percentage >>>>>> inside >>>>>> the given sentence but sentence need to be tokenized first. Includes >>>>>> a 4 class model trained for CoNLL, a 7 class model trained for MUC, and >>>>>> a 3 >>>>>> class model trained on both data sets for the intersection of those class >>>>>> sets. >>>>>> 3 class: Location, Person, Organization >>>>>> 4 class: Location, Person, Organization, Misc >>>>>> 7 class: Time, Location, Organization, Person, Money, Percent, Date >>>>>> POS Tagger Identify: >>>>>> VP(Verb Phrase) ,NP(Noun Phrase) ,JJ(Adjective)…etc >>>>>> >>>>>> Input: Hi. How are you? This is Mike >>>>>> output: Hi_NNP How_WRB are_VBP you? _JJ This_DT is_VBZ Mike._NNP Label >>>>>> each token with the POS Tag, such as noun, verb, adjective, etc., >>>>>> Tokenizing Separates the words which have white spaces in-between by >>>>>> default. Otherwise it can be trained to tokanize by different options. >>>>>> Can >>>>>> tokenize the text either by whitespace or as per the options defined >>>>>> Parsing Once given a tokanized sentence, It will construct the tree >>>>>> structure. This works out the grammatical structure of sentences in >>>>>> a tree structure. The parser provides Stanford Dependencies as well. They >>>>>> represent the grammatical relations between words in a sentence. >>>>>> Dependecies are triplets: name of the relation, governor and dependent. >>>>>> Ex: Bell, based in Los Angeles, makes and distributes electronic, >>>>>> computer and building products. >>>>>> Dependency: nsubj(distributes-10, Bell-1) >>>>>> This is like saying “the subject of distributes is Bell.” Sentence >>>>>> Detection Detect sentence boundaries given a paragraph. Available as >>>>>> ssplit. Can split sentences as per the options defined Regular >>>>>> Expressions Character wise regular expression only. Cannot identify >>>>>> named entities or PoS tags via regular expression Two tools are >>>>>> provided to deal with regular expressions. >>>>>> RegexNER:Can define simple rules with regular expressions and label >>>>>> entities with NE labels that are not provided. >>>>>> Ex: Bachelor of (Arts|Laws|Science|Engineering) DEGREE >>>>>> This rule will label tokens matching with the regex in first column >>>>>> as DEGREE >>>>>> TokensRegex: Can identify patterns over a list of tokens. In addition >>>>>> to java regex matching this provides syntax to match part of speech tags, >>>>>> named entity tags and lemma. >>>>>> Ex: [ { tag:VBD } ], /University/ /of/ [{ ner:LOCATION }] >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Chanuka. >>>>>> >>>>>> >>>>>> On Tue, Aug 19, 2014 at 11:11 PM, Sriskandarajah Suhothayan < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> +1 looks good >>>>>>> >>>>>>> Suho >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 19, 2014 at 9:56 PM, Srinath Perera <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Look good. If possible we should do this with OpenNLP as it has >>>>>>>> apache licence. However, I could not find NLP regex impl there. Please >>>>>>>> look >>>>>>>> at it in detial. >>>>>>>> >>>>>>>> --Srinath >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Aug 19, 2014 at 9:52 PM, Malithi Edirisinghe < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> We are working on a NLP Toolbox improvement in CEP. The main idea >>>>>>>>> of this improvement is to use a NLP library and let user do some NLP >>>>>>>>> operations as Siddhi extensions. >>>>>>>>> >>>>>>>>> So in our implementation we have decided to support following NLP >>>>>>>>> operations. >>>>>>>>> >>>>>>>>> *1. findNameEntityType(sentence, entityType)* >>>>>>>>> >>>>>>>>> *Description:* >>>>>>>>> >>>>>>>>> This operation takes a sentence and a predefined entity type as >>>>>>>>> it's inputs. It will return noun(s) in the sentence that match the >>>>>>>>> defined >>>>>>>>> entity type, as event(s). >>>>>>>>> >>>>>>>>> *inputs:* >>>>>>>>> >>>>>>>>> sentence : sentence to be processed >>>>>>>>> entityType: predefined entity type >>>>>>>>> ORGANIZATION >>>>>>>>> NAME >>>>>>>>> LOCATION >>>>>>>>> *output:* >>>>>>>>> >>>>>>>>> matching noun(s) as event(s) >>>>>>>>> >>>>>>>>> *example:* >>>>>>>>> >>>>>>>>> inputs: >>>>>>>>> sentence : Alice works at WSO2 >>>>>>>>> entityType : NAME >>>>>>>>> >>>>>>>>> output: Alice >>>>>>>>> >>>>>>>>> *2. findNLRegexPattern(sentence, regex)* >>>>>>>>> >>>>>>>>> *Description:* >>>>>>>>> >>>>>>>>> This operation takes a sentence and a regular expression as it's >>>>>>>>> inputs. It will return each match in the sentence, as an event. >>>>>>>>> >>>>>>>>> *inputs:* >>>>>>>>> >>>>>>>>> sentence : sentence to be processed >>>>>>>>> regex : regular expression to be matched >>>>>>>>> *output:* >>>>>>>>> >>>>>>>>> matching pharase(s) as event(s) >>>>>>>>> >>>>>>>>> *example:* >>>>>>>>> >>>>>>>>> inputs: >>>>>>>>> sentence : WSO2 was found in 2005 >>>>>>>>> regex : \\d{4} >>>>>>>>> >>>>>>>>> output: 2005 >>>>>>>>> >>>>>>>>> *3. findRelationship(sentence, regex)* >>>>>>>>> >>>>>>>>> *Description:* >>>>>>>>> >>>>>>>>> This operation takes a sentence and a regular expression as it's >>>>>>>>> inputs. For each relationship extracted from the regular expression >>>>>>>>> the >>>>>>>>> operation will return a triplet; subject, object and relationship as >>>>>>>>> an >>>>>>>>> event. >>>>>>>>> >>>>>>>>> *inputs:* >>>>>>>>> >>>>>>>>> sentence : sentence to be processed >>>>>>>>> regex : regular expression to extract the relationship >>>>>>>>> *output:* >>>>>>>>> >>>>>>>>> triplet(s) of (subject, object, relationship) as event(s) >>>>>>>>> >>>>>>>>> *example:* >>>>>>>>> >>>>>>>>> inputs: >>>>>>>>> sentence : Bob works for WSO2 >>>>>>>>> regex : works for >>>>>>>>> >>>>>>>>> output: (Bob, WSO2, works for) >>>>>>>>> *4. findNameEntityTypeViaDictionary(sentence, dictionary, >>>>>>>>> entityType)* >>>>>>>>> >>>>>>>>> *Description:* >>>>>>>>> >>>>>>>>> This operation takes a sentence, dictionary file and a predefined >>>>>>>>> entity type as it's inputs. It will return noun(s) in the sentence of >>>>>>>>> the >>>>>>>>> defined entity type, that also exists in the dictionary as event(s). >>>>>>>>> >>>>>>>>> *inputs:* >>>>>>>>> >>>>>>>>> sentence : sentence to be processed >>>>>>>>> dictionary : dictionary of entities of the defined entity type >>>>>>>>> entityType : predefined entity type >>>>>>>>> ORGANIZATION >>>>>>>>> NAME >>>>>>>>> LOCATION >>>>>>>>> *output:* >>>>>>>>> >>>>>>>>> matching noun(s) as event(s) >>>>>>>>> >>>>>>>>> *example:* >>>>>>>>> >>>>>>>>> inputs: >>>>>>>>> sentence : Bob works at WSO2 >>>>>>>>> dictionary : (WSO2,ORACLE,IBM) >>>>>>>>> entityType : ORGANIZATION >>>>>>>>> >>>>>>>>> output: WSO2 >>>>>>>>> >>>>>>>>> Each NLP operation defined here will be implemented as a >>>>>>>>> transformer extension to Siddhi. >>>>>>>>> -- >>>>>>>>> >>>>>>>>> *Malithi Edirisinghe* >>>>>>>>> Senior Software Engineer >>>>>>>>> WSO2 Inc. >>>>>>>>> >>>>>>>>> Mobile : +94 (0) 718176807 >>>>>>>>> [email protected] >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Director, Research, WSO2 Inc. >>>>>>>> Visiting Faculty, University of Moratuwa >>>>>>>> Member, Apache Software Foundation >>>>>>>> Research Scientist, Lanka Software Foundation >>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>> Site: http://people.apache.org/~hemapani/ >>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>> Phone: 0772360902 >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> *S. Suhothayan* >>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>>>>> *WSO2 Inc. *http://wso2.com >>>>>>> * <http://wso2.com/>* >>>>>>> lean . enterprise . middleware >>>>>>> >>>>>>> >>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> >>>>>>> twitter: >>>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | >>>>>>> linked-in: >>>>>>> http://lk.linkedin.com/in/suhothayan >>>>>>> <http://lk.linkedin.com/in/suhothayan>* >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Chanuka Dissanayake >>>>>> *Software Engineer | **WSO2 Inc.*; http://wso2.com >>>>>> >>>>>> Mobile: +94 71 33 63 596 >>>>>> Email: [email protected] >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> *S. Suhothayan* >>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor >>>>> *WSO2 Inc. *http://wso2.com >>>>> * <http://wso2.com/>* >>>>> lean . enterprise . middleware >>>>> >>>>> >>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog: >>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> twitter: >>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in: >>>>> http://lk.linkedin.com/in/suhothayan >>>>> <http://lk.linkedin.com/in/suhothayan>* >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> *Malithi Edirisinghe* >>>> Senior Software Engineer >>>> WSO2 Inc. >>>> >>>> Mobile : +94 (0) 718176807 >>>> [email protected] >>>> >>> >>> >>> >>> -- >>> >>> *Malithi Edirisinghe* >>> Senior Software Engineer >>> WSO2 Inc. >>> >>> Mobile : +94 (0) 718176807 >>> [email protected] >>> >> >> >> >> -- >> >> *Malithi Edirisinghe* >> Senior Software Engineer >> WSO2 Inc. >> >> Mobile : +94 (0) 718176807 >> [email protected] >> > > > > -- > ============================ > Director, Research, WSO2 Inc. > Visiting Faculty, University of Moratuwa > Member, Apache Software Foundation > Research Scientist, Lanka Software Foundation > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > -- ============================ Director, Research, WSO2 Inc. Visiting Faculty, University of Moratuwa Member, Apache Software Foundation Research Scientist, Lanka Software Foundation Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
