Re: [Architecture] [CEP] NLP Toolbox

Chanuka Dissanayake Mon, 01 Sep 2014 02:37:49 -0700

Yes, sure.

Thanks.



On Mon, Sep 1, 2014 at 2:42 PM, Srinath Perera <[email protected]> wrote:

> How about 2pm? (Someone had a conflict in the AM)
>
>
> On Mon, Sep 1, 2014 at 2:40 PM, Srinath Perera <[email protected]> wrote:
>
>> Can we meet and discuss? How about tomorrow 11am?
>>
>>
>> On Thu, Aug 28, 2014 at 6:49 PM, Malithi Edirisinghe <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I have looked at how Stanford NLP extract grammatical dependencies in
>>> detail and have following concerns with regard to the implementation of 3rd
>>> query(findRelationship(sentence, regex)).
>>>
>>> When a sentence is given Stanford NLP can recognise around 50
>>> grammatical relationships. I have listed some with simple examples below.
>>>
>>>
>>>    - acomp:adjective complement
>>>
>>> This is an adjectival phrase which functions as the complement (like an
>>> object of the verb).
>>>
>>> ex:
>>>
>>> “She looks very beautiful” -> acomp(looks, beautiful)
>>>
>>>
>>>    - agent
>>>
>>> This is a complement of a passive verb which is introduced by the
>>> preposition “by” and does the action.
>>>
>>> ex:
>>>
>>> “The man has been killed by the police” -> agent(killed, police)
>>> “Effects caused by the protein are important” -> agent(caused, protein)
>>>
>>>
>>>    - aux:auxiliary
>>>
>>> This is the non-main verb of the clause
>>>
>>> ex:
>>>
>>> "Reagan has died" -> aux(died, has)
>>> "He should leave" -> aux(leave,should)
>>>
>>>
>>>    - conj:conjunct
>>>
>>> This is the relation between two elements connected by a coordinating
>>> conjunction, such as “and”, “or”, etc.
>>>
>>> ex:
>>>
>>> “Bill is big and honest” -> conj(big, honest)
>>> “They either ski or snowboard” -> conj(ski, snowboard)
>>>
>>>
>>>    - dobj:direct object
>>>
>>>  This is the noun phrase which is the object of the verb.
>>>
>>>  ex:
>>>
>>>  “They win the lottery” -> dobj(win, lottery)
>>>
>>>
>>>    -  nsubj:nominal subject
>>>
>>>  This is a noun phrase which is the syntactic subject of a clause.
>>>
>>>  ex:
>>>  “The baby is cute” -> nsubj(cute, baby)
>>>
>>>  With this library support, I would like to clarify on following.
>>>
>>>    1.  How should we use the regular expression to extract the
>>>    relationship while the library is extracting relationships itself?
>>>    2. What kind of relationships should we extract, for an example is
>>>    it just simple relationships as identifying the subject, verb and object 
>>> or
>>>    any other?
>>>
>>>
>>>  Kindly expect your thoughts on this.
>>>
>>>  Thanks,
>>>  Malithi.
>>>
>>>
>>>
>>> On Fri, Aug 22, 2014 at 6:11 PM, Malithi Edirisinghe <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We started the implementation with Stanford NLP due to reasons below.
>>>>
>>>> 1. Stanford NLP provides a rich regular expression support in writing
>>>> patterns over tokens, rather than working at character level with normal
>>>> java regular expressions.
>>>>
>>>> 2. Stanford NLP can extract grammatical relationships from the parsed
>>>> tree thus we can easily implement the 3rd query.
>>>>
>>>> Thanks,
>>>>
>>>> Malithi.
>>>>
>>>>
>>>> On Thu, Aug 21, 2014 at 12:58 PM, Malithi Edirisinghe <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Suho,
>>>>>
>>>>> Since Named Entity Recognition is supported by both libraries we can
>>>>> implement the first function from any of them. Both can identify entities
>>>>> like person, location, organization, etc. For the fourth function we found
>>>>> a way that we can simply define dictionaries in openNLP. There is a class
>>>>> called  DictionaryNameFinder which takes a Dictionary and identify any
>>>>> matching entry in the sentence with the dictionary. In Stanford NLP, we
>>>>> could find that there is an implementation for a Dictionary; but yet we
>>>>> couldn't find a way of using
>>>>> that for our requirement. It lacks samples, and seems like we should
>>>>> look into their code to find how they have used it. We will work on it.
>>>>> Anyhow I think it should be possible to define such Dictionary in Stanford
>>>>> NLP also.
>>>>>
>>>>> Thanks,
>>>>> Malithi.
>>>>>
>>>>>
>>>>> On Thu, Aug 21, 2014 at 10:09 AM, Sriskandarajah Suhothayan <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Thats a good compression.
>>>>>> Based on this I believe we have issues in implementing functions 2 &
>>>>>> 3 using OpenNLP.
>>>>>> Can you evaluate others functions as well.
>>>>>>
>>>>>> Suho
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 21, 2014 at 9:54 AM, Chanuka Dissanayake <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> We did a study on both OpenNLP and Stanford NLP libraries and looked
>>>>>>> at the features that could support our implementation.
>>>>>>> Our findings are summarised below.
>>>>>>>
>>>>>>> It seems that Stanford NLP has better capabilities when considering
>>>>>>> support for regular expressons and parsing.
>>>>>>> We would like to discuss this further and choose the appropriate
>>>>>>>
>>>>>>>
>>>>>>>    Feature OpenNLP StanfordNLP  Named Entity Recognizer Will
>>>>>>> identify the person,location,organization,time,date,money,percentage 
>>>>>>> inside
>>>>>>> the given sentence but sentence need to be tokenized first. Includes
>>>>>>> a 4 class model trained for CoNLL, a 7 class model trained for MUC, and 
>>>>>>> a 3
>>>>>>> class model trained on both data sets for the intersection of those 
>>>>>>> class
>>>>>>> sets.
>>>>>>> 3 class: Location, Person, Organization
>>>>>>> 4 class: Location, Person, Organization, Misc
>>>>>>> 7 class: Time, Location, Organization, Person, Money, Percent, Date
>>>>>>>  POS Tagger Identify:
>>>>>>> VP(Verb Phrase) ,NP(Noun Phrase) ,JJ(Adjective)…etc
>>>>>>>
>>>>>>> Input: Hi. How are you? This is Mike
>>>>>>> output: Hi_NNP How_WRB are_VBP you? _JJ This_DT is_VBZ Mike._NNP Label
>>>>>>> each token with the POS Tag, such as noun, verb, adjective, etc.,
>>>>>>> Tokenizing Separates the words which have white spaces in-between
>>>>>>> by default. Otherwise it can be trained to tokanize by different 
>>>>>>> options. Can
>>>>>>> tokenize the text either by whitespace or as per the options defined
>>>>>>> Parsing Once given a tokanized sentence, It will construct the tree
>>>>>>> structure. This works out the grammatical structure of sentences in
>>>>>>> a tree structure. The parser provides Stanford Dependencies as well. 
>>>>>>> They
>>>>>>> represent the grammatical relations between words in a sentence.
>>>>>>> Dependecies are triplets: name of the relation, governor and dependent.
>>>>>>> Ex: Bell, based in Los Angeles, makes and distributes electronic,
>>>>>>> computer and building products.
>>>>>>> Dependency: nsubj(distributes-10, Bell-1)
>>>>>>> This is like saying “the subject of distributes is Bell.”  Sentence
>>>>>>> Detection Detect sentence boundaries given a paragraph. Available
>>>>>>> as ssplit. Can split sentences as per the options defined  Regular
>>>>>>> Expressions Character wise regular expression only. Cannot identify
>>>>>>> named entities or PoS tags via regular expression Two tools are
>>>>>>> provided to deal with regular expressions.
>>>>>>> RegexNER:Can define simple rules with regular expressions and label
>>>>>>> entities with NE labels that are not provided.
>>>>>>> Ex: Bachelor of (Arts|Laws|Science|Engineering) DEGREE
>>>>>>> This rule will label tokens matching with the regex in first column
>>>>>>> as DEGREE
>>>>>>> TokensRegex: Can identify patterns over a list of tokens. In
>>>>>>> addition to java regex matching this provides syntax to match part of
>>>>>>> speech tags, named entity tags and lemma.
>>>>>>>  Ex: [ { tag:VBD } ], /University/ /of/ [{ ner:LOCATION }]
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Chanuka.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 19, 2014 at 11:11 PM, Sriskandarajah Suhothayan <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> +1 looks good
>>>>>>>>
>>>>>>>> Suho
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Aug 19, 2014 at 9:56 PM, Srinath Perera <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Look good. If possible we should do this with OpenNLP as it has
>>>>>>>>> apache licence. However, I could not find NLP regex impl there. 
>>>>>>>>> Please look
>>>>>>>>> at it in detial.
>>>>>>>>>
>>>>>>>>> --Srinath
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 19, 2014 at 9:52 PM, Malithi Edirisinghe <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> We are working on a NLP Toolbox improvement in CEP. The main idea
>>>>>>>>>> of this improvement is to use a NLP library and let user do some NLP
>>>>>>>>>> operations as Siddhi extensions.
>>>>>>>>>>
>>>>>>>>>> So in our implementation we have decided to support following NLP
>>>>>>>>>> operations.
>>>>>>>>>>
>>>>>>>>>> *1. findNameEntityType(sentence, entityType)*
>>>>>>>>>>
>>>>>>>>>> *Description:*
>>>>>>>>>>
>>>>>>>>>> This operation takes a sentence and a predefined entity type as
>>>>>>>>>> it's inputs. It will return noun(s) in the sentence that match the 
>>>>>>>>>> defined
>>>>>>>>>> entity type, as event(s).
>>>>>>>>>>
>>>>>>>>>> *inputs:*
>>>>>>>>>>
>>>>>>>>>> sentence  : sentence to be processed
>>>>>>>>>> entityType: predefined entity type
>>>>>>>>>>  ORGANIZATION
>>>>>>>>>> NAME
>>>>>>>>>>  LOCATION
>>>>>>>>>>  *output:*
>>>>>>>>>>
>>>>>>>>>> matching noun(s) as event(s)
>>>>>>>>>>
>>>>>>>>>> *example:*
>>>>>>>>>>
>>>>>>>>>>  inputs:
>>>>>>>>>> sentence   : Alice works at WSO2
>>>>>>>>>>  entityType : NAME
>>>>>>>>>>
>>>>>>>>>>  output: Alice
>>>>>>>>>>
>>>>>>>>>> *2. findNLRegexPattern(sentence, regex)*
>>>>>>>>>>
>>>>>>>>>> *Description:*
>>>>>>>>>>
>>>>>>>>>> This operation takes a sentence and a regular expression as it's
>>>>>>>>>> inputs. It will return each match in the sentence, as an event.
>>>>>>>>>>
>>>>>>>>>> *inputs:*
>>>>>>>>>>
>>>>>>>>>> sentence  : sentence to be processed
>>>>>>>>>> regex       : regular expression to be matched
>>>>>>>>>>  *output:*
>>>>>>>>>>
>>>>>>>>>> matching pharase(s) as event(s)
>>>>>>>>>>
>>>>>>>>>> *example:*
>>>>>>>>>>
>>>>>>>>>> inputs:
>>>>>>>>>>  sentence   : WSO2 was found in 2005
>>>>>>>>>>  regex        : \\d{4}
>>>>>>>>>>
>>>>>>>>>>  output: 2005
>>>>>>>>>>
>>>>>>>>>> *3. findRelationship(sentence, regex)*
>>>>>>>>>>
>>>>>>>>>> *Description:*
>>>>>>>>>>
>>>>>>>>>> This operation takes a sentence and a regular expression as it's
>>>>>>>>>> inputs. For each relationship extracted from the regular expression 
>>>>>>>>>> the
>>>>>>>>>> operation will return a triplet; subject, object and relationship as 
>>>>>>>>>> an
>>>>>>>>>> event.
>>>>>>>>>>
>>>>>>>>>> *inputs:*
>>>>>>>>>>
>>>>>>>>>> sentence  : sentence to be processed
>>>>>>>>>> regex       : regular expression to extract the relationship
>>>>>>>>>>  *output:*
>>>>>>>>>>
>>>>>>>>>> triplet(s) of (subject, object, relationship) as event(s)
>>>>>>>>>>
>>>>>>>>>> *example:*
>>>>>>>>>>
>>>>>>>>>>  inputs:
>>>>>>>>>> sentence   : Bob works for WSO2
>>>>>>>>>>  regex        : works for
>>>>>>>>>>
>>>>>>>>>>  output: (Bob, WSO2, works for)
>>>>>>>>>>  *4. findNameEntityTypeViaDictionary(sentence, dictionary,
>>>>>>>>>> entityType)*
>>>>>>>>>>
>>>>>>>>>> *Description:*
>>>>>>>>>>
>>>>>>>>>> This operation takes a sentence, dictionary file and a predefined
>>>>>>>>>> entity type as it's inputs. It will return noun(s) in the sentence 
>>>>>>>>>> of the
>>>>>>>>>> defined entity type, that also exists in the dictionary as event(s).
>>>>>>>>>>
>>>>>>>>>> *inputs:*
>>>>>>>>>>
>>>>>>>>>> sentence   : sentence to be processed
>>>>>>>>>> dictionary  : dictionary of entities of the defined entity type
>>>>>>>>>> entityType : predefined entity type
>>>>>>>>>>  ORGANIZATION
>>>>>>>>>>   NAME
>>>>>>>>>>  LOCATION
>>>>>>>>>>  *output:*
>>>>>>>>>>
>>>>>>>>>> matching noun(s) as event(s)
>>>>>>>>>>
>>>>>>>>>> *example:*
>>>>>>>>>>
>>>>>>>>>>  inputs:
>>>>>>>>>> sentence    : Bob works at WSO2
>>>>>>>>>>  dictionary   : (WSO2,ORACLE,IBM)
>>>>>>>>>> entityType  : ORGANIZATION
>>>>>>>>>>
>>>>>>>>>> output: WSO2
>>>>>>>>>>
>>>>>>>>>> Each NLP operation defined here will be implemented as a
>>>>>>>>>> transformer extension to Siddhi.
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> *Malithi Edirisinghe*
>>>>>>>>>> Senior Software Engineer
>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>
>>>>>>>>>> Mobile : +94 (0) 718176807
>>>>>>>>>>  [email protected]
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> ============================
>>>>>>>>> Director, Research, WSO2 Inc.
>>>>>>>>> Visiting Faculty, University of Moratuwa
>>>>>>>>> Member, Apache Software Foundation
>>>>>>>>> Research Scientist, Lanka Software Foundation
>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>> Phone: 0772360902
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *S. Suhothayan*
>>>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>>>>>>  *WSO2 Inc. *http://wso2.com
>>>>>>>> * <http://wso2.com/>*
>>>>>>>> lean . enterprise . middleware
>>>>>>>>
>>>>>>>>
>>>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>>>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> 
>>>>>>>> twitter:
>>>>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | 
>>>>>>>> linked-in:
>>>>>>>> http://lk.linkedin.com/in/suhothayan 
>>>>>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Chanuka Dissanayake
>>>>>>> *Software Engineer | **WSO2 Inc.*; http://wso2.com
>>>>>>>
>>>>>>> Mobile: +94 71 33 63 596
>>>>>>> Email: [email protected]
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *S. Suhothayan*
>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>>>>  *WSO2 Inc. *http://wso2.com
>>>>>> * <http://wso2.com/>*
>>>>>> lean . enterprise . middleware
>>>>>>
>>>>>>
>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> 
>>>>>> twitter:
>>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | 
>>>>>> linked-in:
>>>>>> http://lk.linkedin.com/in/suhothayan 
>>>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Malithi Edirisinghe*
>>>>> Senior Software Engineer
>>>>> WSO2 Inc.
>>>>>
>>>>> Mobile : +94 (0) 718176807
>>>>> [email protected]
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Malithi Edirisinghe*
>>>> Senior Software Engineer
>>>> WSO2 Inc.
>>>>
>>>> Mobile : +94 (0) 718176807
>>>> [email protected]
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Malithi Edirisinghe*
>>> Senior Software Engineer
>>> WSO2 Inc.
>>>
>>> Mobile : +94 (0) 718176807
>>> [email protected]
>>>
>>
>>
>>
>> --
>> ============================
>> Director, Research, WSO2 Inc.
>> Visiting Faculty, University of Moratuwa
>> Member, Apache Software Foundation
>> Research Scientist, Lanka Software Foundation
>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>> Site: http://people.apache.org/~hemapani/
>> Photos: http://www.flickr.com/photos/hemapani/
>> Phone: 0772360902
>>
>
>
>
> --
> ============================
> Director, Research, WSO2 Inc.
> Visiting Faculty, University of Moratuwa
> Member, Apache Software Foundation
> Research Scientist, Lanka Software Foundation
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
Chanuka Dissanayake
*Software Engineer | **WSO2 Inc.*; http://wso2.com

Mobile: +94 71 33 63 596
Email: [email protected]

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] [CEP] NLP Toolbox

Reply via email to