Hi Suho,

Thank you for the reply, I'll try it.


On Thu, Sep 4, 2014 at 10:18 PM, Sriskandarajah Suhothayan <[email protected]>
wrote:

> Hi
>
> For this demo first we'll get some twitter data to csv file, you can do
> this by writing twitter client using twitter4j[1].
> CEP has the capability to playback the data from the csv file using the
> CEP tryit feature (available from CEP 4.0.0).
>
> IMHO I dont think running this with the live data stream will be much
> useful because we wont be able to demonstrate all the different use-cases
> with that.
>
> Regards
> Suho
>
> [1]http://twitter4j.org/en/index.html
>
>
>
> On Thu, Sep 4, 2014 at 7:22 PM, Chanuka Dissanayake <[email protected]>
> wrote:
>
>> Hi All,
>>
>> In order to check the functions, I got a twitter stream using ESB twitter
>> connector. Part of that stream shown below.
>>
>> <statuses>
>>    <statuses>
>>       <metadata>
>>          <result_type>popular</result_type>
>>          <iso_language_code>it</iso_language_code>
>>       </metadata>
>>       <created_at>Tue Sep 02 13:09:08 +0000 2014</created_at>
>>       <id>506790930058199040</id>
>>       <id_str>506790930058199040</id_str>
>>       <text>Donate. Comunque donate. 💪❤️accetto la nomination all'
>> #icebucketchallange di veronicazimbaro e…http://t.co/eYm7WFFZd4</text>
>>       <source>&lt;a href="http://instagram.com";
>> rel="nofollow"&gt;Instagram&lt;/a&gt;</source>
>>       <truncated>false</truncated>
>>       <in_reply_to_status_id/>
>>       <in_reply_to_status_id_str/>
>>       <in_reply_to_user_id/>
>>       <in_reply_to_user_id_str/>
>>       <in_reply_to_screen_name/>
>>       <user>
>>          <id>563995182</id>
>>          <id_str>563995182</id_str>
>>          <name>Martina Maccari</name>
>>          <screen_name>MartinaZoev</screen_name>
>>          <location>Torino</location>
>>          <description>Prendere o lasciare.</description>
>>          <url>http://t.co/P8zSwRXvoM</url>
>>          <entities>
>>             <url>
>>                <urls>
>>                   <url>http://t.co/P8zSwRXvoM</url>
>>                   <expanded_url>http://www.zoodizoev.com</expanded_url>
>>                   <display_url>zoodizoev.com</display_url>
>>                   <indices>0</indices>
>>                   <indices>22</indices>
>>                </urls>
>>             </url>
>>             <description/>
>>          </entities>
>>          <protected>false</protected>
>>          <followers_count>18057</followers_count>
>>          <friends_count>96</friends_count>
>>          <listed_count>68</listed_count>
>>          <created_at>Thu Apr 26 17:54:59 +0000 2012</created_at>
>>          <favourites_count>144</favourites_count>
>>          <utc_offset>7200</utc_offset>
>>          <time_zone>Rome</time_zone>
>>          <geo_enabled>true</geo_enabled>
>>          <verified>false</verified>
>>          <statuses_count>2094</statuses_count>
>>          <lang>it</lang>
>>          <contributors_enabled>false</contributors_enabled>
>>          <is_translator>false</is_translator>
>>          <is_translation_enabled>false</is_translation_enabled>
>>          <profile_background_color>131516</profile_background_color>
>>          <profile_background_image_url>
>> http://abs.twimg.com/images/themes/theme14/bg.gif
>> </profile_background_image_url>
>>          <profile_background_image_url_https>
>> https://abs.twimg.com/images/themes/theme14/bg.gif
>> </profile_background_image_url_https>
>>          <profile_background_tile>false</profile_background_tile>
>>          <profile_image_url>
>> http://pbs.twimg.com/profile_images/378800000761725544/1efa8c9032ac97c42619986fc52adb7a_normal.jpeg
>> </profile_image_url>
>>          <profile_image_url_https>
>> https://pbs.twimg.com/profile_images/378800000761725544/1efa8c9032ac97c42619986fc52adb7a_normal.jpeg
>> </profile_image_url_https>
>>          <profile_banner_url>
>> https://pbs.twimg.com/profile_banners/563995182/1384873417
>> </profile_banner_url>
>>           <profile_link_color>F518DB</profile_link_color>
>>
>>  <profile_sidebar_border_color>EEEEEE</profile_sidebar_border_color>
>>          <profile_sidebar_fill_color>EFEFEF</profile_sidebar_fill_color>
>>          <profile_text_color>333333</profile_text_color>
>>          <profile_use_background_image>true</profile_use_background_image>
>>          <default_profile>false</default_profile>
>>          <default_profile_image>false</default_profile_image>
>>          <following/>
>>          <follow_request_sent/>
>>          <notifications/>
>>       </user>
>>       <geo/>
>>       <coordinates/>
>>       <place/>
>>       <contributors/>
>>       <retweet_count>41</retweet_count>
>>       <favorite_count>62</favorite_count>
>>       <entities>
>>          <hashtags>
>>             <text>icebucketchallange</text>
>>             <indices>55</indices>
>>             <indices>74</indices>
>>          </hashtags>
>>          <urls>
>>             <url>http://t.co/eYm7WFFZd4</url>
>>             <expanded_url>http://instagram.com/p/scbsmeurVA/
>> </expanded_url>
>>             <display_url>instagram.com/p/scbsmeurVA/</display_url>
>>             <indices>97</indices>
>>             <indices>119</indices>
>>          </urls>
>>       </entities>
>>     <favorited>false</favorited>
>>       <retweeted>false</retweeted>
>>       <possibly_sensitive>false</possibly_sensitive>
>>       <lang>it</lang>
>>    </statuses>
>>    <statuses>
>>       <metadata>
>>          <iso_language_code>und</iso_language_code>
>>          <result_type>popular</result_type>
>>       </metadata>
>>       <created_at>Tue Sep 02 21:52:21 +0000 2014</created_at>
>>       <id>506922604003725313</id>
>>       <id_str>506922604003725313</id_str>
>>       <text>#RT #icebucketchallange http://t.co/UXXzndKwH1</text>
>>       <source>&lt;a href="http://twitter.com/download/android";
>> rel="nofollow"&gt;Twitter for Android&lt;/a&gt;</source>
>>       <truncated>false</truncated>
>>       <in_reply_to_status_id/>
>>       <in_reply_to_status_id_str/>
>>       <in_reply_to_user_id/>
>>       <in_reply_to_user_id_str/>
>>       <in_reply_to_screen_name/>
>>       <user>
>>          <id>1103956165</id>
>>          <id_str>1103956165</id_str>
>>          <name>Defensive Backs</name>
>>          <screen_name>DB__TWEETS</screen_name>
>>          <location>#NoFlyZone</location>
>>          <description>blown a coverage? make up for it... missed a
>> tackle? execute more... got burned? learn from your mistake... got scored
>> on? dont let it happen again!!!</description>
>>          <url/>
>>          <entities>
>>             <description/>
>>          </entities>
>>          <protected>false</protected>
>>          <followers_count>5185</followers_count>
>>          <friends_count>2222</friends_count>
>>          <listed_count>2</listed_count>
>>          <created_at>Sat Jan 19 15:24:29 +0000 2013</created_at>
>>          <favourites_count>3579</favourites_count>
>>          <utc_offset>-18000</utc_offset>
>>          <time_zone>Central Time (US &amp; Canada)</time_zone>
>>           <geo_enabled>true</geo_enabled>
>>          <verified>false</verified>
>>          <statuses_count>4018</statuses_count>
>>          <lang>en</lang>
>>          <contributors_enabled>false</contributors_enabled>
>>          <is_translator>false</is_translator>
>>          <is_translation_enabled>false</is_translation_enabled>
>>          <profile_background_color>C0DEED</profile_background_color>
>>          <profile_background_image_url>
>> http://abs.twimg.com/images/themes/theme1/bg.png
>> </profile_background_image_url>
>>          <profile_background_image_url_https>
>> https://abs.twimg.com/images/themes/theme1/bg.png
>> </profile_background_image_url_https>
>>          <profile_background_tile>false</profile_background_tile>
>>          <profile_image_url>
>> http://pbs.twimg.com/profile_images/483757082978430977/YrXT9l4B_normal.jpeg
>> </profile_image_url>
>>          <profile_image_url_https>
>> https://pbs.twimg.com/profile_images/483757082978430977/YrXT9l4B_normal.jpeg
>> </profile_image_url_https>
>>          <profile_banner_url>
>> https://pbs.twimg.com/profile_banners/1103956165/1394929163
>> </profile_banner_url>
>>          <profile_link_color>0084B4</profile_link_color>
>>
>>  <profile_sidebar_border_color>C0DEED</profile_sidebar_border_color>
>>          <profile_sidebar_fill_color>DDEEF6</profile_sidebar_fill_color>
>>          <profile_text_color>333333</profile_text_color>
>>          <profile_use_background_image>true</profile_use_background_image>
>>          <default_profile>true</default_profile>
>>          <default_profile_image>false</default_profile_image>
>>          <following/>
>>          <follow_request_sent/>
>>          <notifications/>
>>       </user>
>>       <geo/>
>>       <coordinates/>
>>       <place/>
>>       <contributors/>
>>       <retweet_count>20</retweet_count>
>>       <favorite_count>14</favorite_count>
>>       <entities>
>>          <hashtags>
>>             <text>RT</text>
>>             <indices>0</indices>
>>             <indices>3</indices>
>>          </hashtags>
>>          <hashtags>
>>             <text>icebucketchallange</text>
>>             <indices>4</indices>
>>             <indices>23</indices>
>>          </hashtags>
>>          <media>
>>             <id>506922601248063488</id>
>>             <id_str>506922601248063488</id_str>
>>             <indices>24</indices>
>>             <indices>46</indices>
>>             <media_url>http://pbs.twimg.com/media/Bwjza4gCAAA9xto.jpg
>> </media_url>
>>             <media_url_https>
>> https://pbs.twimg.com/media/Bwjza4gCAAA9xto.jpg</media_url_https>
>>             <url>http://t.co/UXXzndKwH1</url>
>>             <display_url>pic.twitter.com/UXXzndKwH1</display_url>
>>             <expanded_url>
>> http://twitter.com/DB__TWEETS/status/506922604003725313/photo/1
>> </expanded_url>
>>             <type>photo</type>
>>             <sizes>
>>                <small>
>>                   <w>288</w>
>>                   <h>204</h>
>>                   <resize>fit</resize>
>>                </small>
>>                <medium>
>>                   <w>288</w>
>>                   <h>204</h>
>>                   <resize>fit</resize>
>>                </medium>
>>                <large>
>>                   <w>288</w>
>>                   <h>204</h>
>>                   <resize>fit</resize>
>>                </large>
>>                <thumb>
>>                   <w>150</w>
>>                   <h>150</h>
>>                   <resize>crop</resize>
>>                </thumb>
>>             </sizes>
>>          </media>
>>       </entities>
>>       <favorited>false</favorited>
>>       <retweeted>false</retweeted>
>>       <possibly_sensitive>false</possibly_sensitive>
>>       <lang>und</lang>
>>    </statuses>
>>    <statuses>
>>       <metadata>
>>          <result_type>popular</result_type>
>>          <iso_language_code>nl</iso_language_code>
>>       </metadata>
>>       <created_at>Fri Aug 29 22:44:20 +0000 2014</created_at>
>>       <id>505486135469309952</id>
>>       <id_str>505486135469309952</id_str>
>>       <text>#ALS Foundation geeft toe dat 73% van de donaties niet wordt
>> gebruikt voor ALS onderzoek: http://t.co/dgfSvFQC2Q
>> #Icebucketchallange</text>
>>       <source>&lt;a href="http://tapbots.com/tweetbot";
>> rel="nofollow"&gt;Tweetbot for iÎźS&lt;/a&gt;</source>
>>       <truncated>false</truncated>
>>       <in_reply_to_status_id/>
>>       <in_reply_to_status_id_str/>
>>       <in_reply_to_user_id/>
>>       <in_reply_to_user_id_str/>
>>       <in_reply_to_screen_name/>
>>       <user>
>>          <id>98353402</id>
>>          <id_str>98353402</id_str>
>>          <name>Petra Blankwaard</name>
>>          <screen_name>indigonl</screen_name>
>>          <location>Den Haag</location>
>>          <description>Webnerd ~ Apple ~ MINI ~ Magento ~ WordPress ~ SEO
>> ~ motorrijden ~ duiken ~ humor ~ psychologie ~ wetenschap ~ Recht is vaak
>> krom</description>
>>          <url>http://t.co/t1B53SkiYm</url>
>>          <entities>
>>             <url>
>>                <urls>
>>                   <url>http://t.co/t1B53SkiYm</url>
>>                   <expanded_url>http://www.indigowebstudio.nl
>> </expanded_url>
>>                   <display_url>indigowebstudio.nl</display_url>
>>                   <indices>0</indices>
>>                   <indices>22</indices>
>>                </urls>
>>             </url>
>>             <description/>
>>          </entities>
>>          <protected>false</protected>
>>          <followers_count>3120</followers_count>
>>          <friends_count>1847</friends_count>
>>          <listed_count>177</listed_count>
>>          <created_at>Mon Dec 21 11:20:58 +0000 2009</created_at>
>>          <favourites_count>30</favourites_count>
>>          <utc_offset>7200</utc_offset>
>>          <time_zone>Amsterdam</time_zone>
>>          <geo_enabled>true</geo_enabled>
>>          <verified>false</verified>
>>          <statuses_count>60574</statuses_count>
>>          <lang>nl</lang>
>>          <contributors_enabled>false</contributors_enabled>
>>          <is_translator>false</is_translator>
>>          <is_translation_enabled>false</is_translation_enabled>
>>          <profile_background_color>273182</profile_background_color>
>>          <profile_background_image_url>
>> http://pbs.twimg.com/profile_background_images/265611238/Twitter_page.png
>> </profile_background_image_url>
>>          <profile_background_image_url_https>
>> https://pbs.twimg.com/profile_background_images/265611238/Twitter_page.png
>> </profile_background_image_url_https>
>>          <profile_background_tile>false</profile_background_tile>
>>          <profile_image_url>
>> http://pbs.twimg.com/profile_images/378800000577198041/b27b8688897f286e45ec1c8aee8afbe2_normal.jpeg
>> </profile_image_url>
>>          <profile_image_url_https>
>> https://pbs.twimg.com/profile_images/378800000577198041/b27b8688897f286e45ec1c8aee8afbe2_normal.jpeg
>> </profile_image_url_https>
>>          <profile_banner_url>
>> https://pbs.twimg.com/profile_banners/98353402/1394458520
>> </profile_banner_url>
>>          <profile_link_color>E39517</profile_link_color>
>>
>>  <profile_sidebar_border_color>FFFFFF</profile_sidebar_border_color>
>>          <profile_sidebar_fill_color>DEDEDE</profile_sidebar_fill_color>
>>          <profile_text_color>273182</profile_text_color>
>>
>>  <profile_use_background_image>false</profile_use_background_image>
>>          <default_profile>false</default_profile>
>>          <default_profile_image>false</default_profile_image>
>>          <following/>
>>          <follow_request_sent/>
>>          <notifications/>
>>       </user>
>>       <geo/>
>>       <coordinates/>
>>       <place/>
>>       <contributors/>
>>       <retweet_count>54</retweet_count>
>>       <favorite_count>3</favorite_count>
>>       <entities>
>>          <hashtags>
>>             <text>ALS</text>
>>             <indices>0</indices>
>>             <indices>4</indices>
>>          </hashtags>
>>          <hashtags>
>>             <text>Icebucketchallange</text>
>>             <indices>113</indices>
>>             <indices>132</indices>
>>          </hashtags>
>>          <urls>
>>             <url>http://t.co/dgfSvFQC2Q</url>
>>             <expanded_url>http://bit.ly/1rF4RY4</expanded_url>
>>             <display_url>bit.ly/1rF4RY4</display_url>
>>             <indices>90</indices>
>>             <indices>112</indices>
>>          </urls>
>>       </entities>
>>       <favorited>false</favorited>
>>       <retweeted>false</retweeted>
>>       <possibly_sensitive>false</possibly_sensitive>
>>       <lang>nl</lang>
>>    </statuses>
>>
>>
>>
>> Problem is: By writing a mediator class we can extract the relevant
>> information, but content of the stream in not descriptive enough to test
>> the functions.
>>
>>
>>    - Is there any better way/inputs to do the testing?
>>    - If we try to get the stream continuously, is there any playback
>>    option to retrieve the data in CEP?
>>    - Is, writing a mediator class to extract the data and push that to
>>    CEP from ESB as scheduled task, is better?
>>
>>
>> Thank you.
>>
>>
>> On Tue, Sep 2, 2014 at 5:51 PM, Malithi Edirisinghe <[email protected]>
>> wrote:
>>
>>> Hi All,
>>>
>>> After having a discussion on $subject with Srinath and Suho we agreed on
>>> following changes for our implementation.
>>>
>>> 1. The 2nd operation findNLRegexPattern(sentence, regex) is renamed to
>>> findTokensRegexPattern(sentence, regex) since this exposes the TokensRegex
>>> support in Stanford NLP library.
>>>
>>> 2. Introduced the following operation to expose the Semgrex regular
>>> expression support in Stanford NLP.
>>>
>>>
>>>    - findSemgrexPattern(sentence, regex)
>>>
>>>  Description:
>>>
>>> This operation takes a sentence and a regular expression as it's inputs.
>>> It will return each match in the sentence, as an event.
>>>
>>> inputs:
>>>
>>> sentence  : sentence to be processed
>>> regex       : regular expression to be matched. Regex sytax should be in
>>> Stanford NLP Semgrex
>>>  output: matching pharase(s) as event(s)
>>>
>>>
>>> example:
>>>
>>> inputs:
>>>  sentence   : They win the lottery
>>> regex         : {} >/nsubj|agent/ {}
>>>
>>> output: win
>>>
>>>
>>> 3. Introduced following two operations to extract relationships instead
>>> of the 3rd operation   findRelationship(sentence, regex) defined above.
>>>
>>>
>>>    - findRelationshipByVerb(sentence, verb)
>>>
>>> Description:
>>>
>>> This operation takes a sentence and a verb as it's inputs. It extract
>>> the subject for the defined verb and object for the defined verb. For each
>>> such relationship extracted from the operation will return a triplet;
>>> subject, object and verb as an event.
>>>
>>> inputs:
>>>
>>> sentence  : sentence to be processed
>>> verb       : verb to extract the relationship
>>>  output: triplet(s) of (subject, object, verb) as event(s)
>>>
>>> example:
>>>
>>> inputs:
>>> sentence  : They win the lottery
>>>  verb         : works for
>>>
>>> output: (Bob, WSO2, verb)
>>>  inputs:
>>> sentence   : The man has been killed by the police
>>>
>>>  verb         : killed
>>>
>>>
>>> output: (police, man, killed)
>>>
>>>
>>>
>>>    - findRelationshipByRegex(sentence, regex)
>>>
>>> This operation takes a sentence and a regex as it's input. The regex
>>> should define a regular expression to extract subject, object and
>>> relationship. If regex is defined as per the syntax all matches found will
>>> be returned as a triplet; subject, object and relationship as an event
>>> otherwise an error is thrown.
>>>
>>> inputs:
>>>
>>> sentence  : sentence to be processed
>>> verb       : regex to extract the relationship
>>>  output: triplet(s) of (subject, object, verb) as event(s)
>>>
>>> example:
>>>
>>> inputs:
>>> sentence   : They win the lottery
>>>  regex        : {}=verb >/nsubj|agent/ {}=subject ?>/dobj/ {}=object
>>>
>>> output: (They, lottery, win)
>>>  Note:
>>>
>>> With the NLP library we can simply get the match of the above regular
>>> expression which is "win" in this case or either we can get each node named
>>> via the regular expression. i.e verb -> "win", subject -> "They", object ->
>>> "lottery"
>>>
>>> Welcome any comments you might have on above changes.
>>>
>>> Thank You.
>>> Malithi.
>>>
>>>
>>>
>>> On Mon, Sep 1, 2014 at 3:06 PM, Chanuka Dissanayake <[email protected]>
>>> wrote:
>>>
>>>> Yes, sure.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Mon, Sep 1, 2014 at 2:42 PM, Srinath Perera <[email protected]>
>>>> wrote:
>>>>
>>>>> How about 2pm? (Someone had a conflict in the AM)
>>>>>
>>>>>
>>>>> On Mon, Sep 1, 2014 at 2:40 PM, Srinath Perera <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Can we meet and discuss? How about tomorrow 11am?
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 28, 2014 at 6:49 PM, Malithi Edirisinghe <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have looked at how Stanford NLP extract grammatical dependencies
>>>>>>> in detail and have following concerns with regard to the implementation 
>>>>>>> of
>>>>>>> 3rd query(findRelationship(sentence, regex)).
>>>>>>>
>>>>>>> When a sentence is given Stanford NLP can recognise around 50
>>>>>>> grammatical relationships. I have listed some with simple examples 
>>>>>>> below.
>>>>>>>
>>>>>>>
>>>>>>>    - acomp:adjective complement
>>>>>>>
>>>>>>> This is an adjectival phrase which functions as the complement (like
>>>>>>> an object of the verb).
>>>>>>>
>>>>>>> ex:
>>>>>>>
>>>>>>> “She looks very beautiful” -> acomp(looks, beautiful)
>>>>>>>
>>>>>>>
>>>>>>>    - agent
>>>>>>>
>>>>>>> This is a complement of a passive verb which is introduced by the
>>>>>>> preposition “by” and does the action.
>>>>>>>
>>>>>>> ex:
>>>>>>>
>>>>>>> “The man has been killed by the police” -> agent(killed, police)
>>>>>>> “Effects caused by the protein are important” -> agent(caused,
>>>>>>> protein)
>>>>>>>
>>>>>>>
>>>>>>>    - aux:auxiliary
>>>>>>>
>>>>>>> This is the non-main verb of the clause
>>>>>>>
>>>>>>> ex:
>>>>>>>
>>>>>>> "Reagan has died" -> aux(died, has)
>>>>>>> "He should leave" -> aux(leave,should)
>>>>>>>
>>>>>>>
>>>>>>>    - conj:conjunct
>>>>>>>
>>>>>>> This is the relation between two elements connected by a
>>>>>>> coordinating conjunction, such as “and”, “or”, etc.
>>>>>>>
>>>>>>> ex:
>>>>>>>
>>>>>>> “Bill is big and honest” -> conj(big, honest)
>>>>>>> “They either ski or snowboard” -> conj(ski, snowboard)
>>>>>>>
>>>>>>>
>>>>>>>    - dobj:direct object
>>>>>>>
>>>>>>>  This is the noun phrase which is the object of the verb.
>>>>>>>
>>>>>>>  ex:
>>>>>>>
>>>>>>>  “They win the lottery” -> dobj(win, lottery)
>>>>>>>
>>>>>>>
>>>>>>>    -  nsubj:nominal subject
>>>>>>>
>>>>>>>  This is a noun phrase which is the syntactic subject of a clause.
>>>>>>>
>>>>>>>  ex:
>>>>>>>  “The baby is cute” -> nsubj(cute, baby)
>>>>>>>
>>>>>>>  With this library support, I would like to clarify on following.
>>>>>>>
>>>>>>>    1.  How should we use the regular expression to extract the
>>>>>>>    relationship while the library is extracting relationships itself?
>>>>>>>    2. What kind of relationships should we extract, for an example
>>>>>>>    is it just simple relationships as identifying the subject, verb and 
>>>>>>> object
>>>>>>>    or any other?
>>>>>>>
>>>>>>>
>>>>>>>  Kindly expect your thoughts on this.
>>>>>>>
>>>>>>>  Thanks,
>>>>>>>  Malithi.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 22, 2014 at 6:11 PM, Malithi Edirisinghe <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We started the implementation with Stanford NLP due to reasons
>>>>>>>> below.
>>>>>>>>
>>>>>>>> 1. Stanford NLP provides a rich regular expression support in
>>>>>>>> writing patterns over tokens, rather than working at character level 
>>>>>>>> with
>>>>>>>> normal java regular expressions.
>>>>>>>>
>>>>>>>> 2. Stanford NLP can extract grammatical relationships from the
>>>>>>>> parsed tree thus we can easily implement the 3rd query.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Malithi.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 21, 2014 at 12:58 PM, Malithi Edirisinghe <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Suho,
>>>>>>>>>
>>>>>>>>> Since Named Entity Recognition is supported by both libraries we
>>>>>>>>> can implement the first function from any of them. Both can identify
>>>>>>>>> entities like person, location, organization, etc. For the fourth 
>>>>>>>>> function
>>>>>>>>> we found a way that we can simply define dictionaries in openNLP. 
>>>>>>>>> There is
>>>>>>>>> a class called  DictionaryNameFinder which takes a Dictionary and 
>>>>>>>>> identify
>>>>>>>>> any matching entry in the sentence with the dictionary. In Stanford 
>>>>>>>>> NLP, we
>>>>>>>>> could find that there is an implementation for a Dictionary; but yet 
>>>>>>>>> we
>>>>>>>>> couldn't find a way of using
>>>>>>>>> that for our requirement. It lacks samples, and seems like we
>>>>>>>>> should look into their code to find how they have used it. We will 
>>>>>>>>> work on
>>>>>>>>> it. Anyhow I think it should be possible to define such Dictionary in
>>>>>>>>> Stanford NLP also.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Malithi.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 21, 2014 at 10:09 AM, Sriskandarajah Suhothayan <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Thats a good compression.
>>>>>>>>>> Based on this I believe we have issues in implementing functions
>>>>>>>>>> 2 & 3 using OpenNLP.
>>>>>>>>>> Can you evaluate others functions as well.
>>>>>>>>>>
>>>>>>>>>> Suho
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 21, 2014 at 9:54 AM, Chanuka Dissanayake <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> We did a study on both OpenNLP and Stanford NLP libraries and
>>>>>>>>>>> looked at the features that could support our implementation.
>>>>>>>>>>> Our findings are summarised below.
>>>>>>>>>>>
>>>>>>>>>>> It seems that Stanford NLP has better capabilities when
>>>>>>>>>>> considering support for regular expressons and parsing.
>>>>>>>>>>> We would like to discuss this further and choose the appropriate
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Feature OpenNLP StanfordNLP  Named Entity Recognizer Will
>>>>>>>>>>> identify the 
>>>>>>>>>>> person,location,organization,time,date,money,percentage inside
>>>>>>>>>>> the given sentence but sentence need to be tokenized first. Includes
>>>>>>>>>>> a 4 class model trained for CoNLL, a 7 class model trained for MUC, 
>>>>>>>>>>> and a 3
>>>>>>>>>>> class model trained on both data sets for the intersection of those 
>>>>>>>>>>> class
>>>>>>>>>>> sets.
>>>>>>>>>>> 3 class: Location, Person, Organization
>>>>>>>>>>> 4 class: Location, Person, Organization, Misc
>>>>>>>>>>> 7 class: Time, Location, Organization, Person, Money, Percent,
>>>>>>>>>>> Date
>>>>>>>>>>>  POS Tagger Identify:
>>>>>>>>>>> VP(Verb Phrase) ,NP(Noun Phrase) ,JJ(Adjective)…etc
>>>>>>>>>>>
>>>>>>>>>>> Input: Hi. How are you? This is Mike
>>>>>>>>>>> output: Hi_NNP How_WRB are_VBP you? _JJ This_DT is_VBZ Mike._NNP 
>>>>>>>>>>> Label
>>>>>>>>>>> each token with the POS Tag, such as noun, verb, adjective, etc.,
>>>>>>>>>>> Tokenizing Separates the words which have white spaces
>>>>>>>>>>> in-between by default. Otherwise it can be trained to tokanize by 
>>>>>>>>>>> different
>>>>>>>>>>> options. Can tokenize the text either by whitespace or as per
>>>>>>>>>>> the options defined  Parsing Once given a tokanized sentence,
>>>>>>>>>>> It will construct the tree structure. This works out the
>>>>>>>>>>> grammatical structure of sentences in a tree structure. The parser 
>>>>>>>>>>> provides
>>>>>>>>>>> Stanford Dependencies as well. They represent the grammatical 
>>>>>>>>>>> relations
>>>>>>>>>>> between words in a sentence. Dependecies are triplets: name of the
>>>>>>>>>>> relation, governor and dependent.
>>>>>>>>>>> Ex: Bell, based in Los Angeles, makes and distributes
>>>>>>>>>>> electronic, computer and building products.
>>>>>>>>>>> Dependency: nsubj(distributes-10, Bell-1)
>>>>>>>>>>> This is like saying “the subject of distributes is Bell.”  Sentence
>>>>>>>>>>> Detection Detect sentence boundaries given a paragraph. Available
>>>>>>>>>>> as ssplit. Can split sentences as per the options defined  Regular
>>>>>>>>>>> Expressions Character wise regular expression only. Cannot
>>>>>>>>>>> identify named entities or PoS tags via regular expression Two
>>>>>>>>>>> tools are provided to deal with regular expressions.
>>>>>>>>>>> RegexNER:Can define simple rules with regular expressions and
>>>>>>>>>>> label entities with NE labels that are not provided.
>>>>>>>>>>> Ex: Bachelor of (Arts|Laws|Science|Engineering) DEGREE
>>>>>>>>>>> This rule will label tokens matching with the regex in first
>>>>>>>>>>> column as DEGREE
>>>>>>>>>>> TokensRegex: Can identify patterns over a list of tokens. In
>>>>>>>>>>> addition to java regex matching this provides syntax to match part 
>>>>>>>>>>> of
>>>>>>>>>>> speech tags, named entity tags and lemma.
>>>>>>>>>>>  Ex: [ { tag:VBD } ], /University/ /of/ [{ ner:LOCATION }]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Chanuka.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Aug 19, 2014 at 11:11 PM, Sriskandarajah Suhothayan <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 looks good
>>>>>>>>>>>>
>>>>>>>>>>>> Suho
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Aug 19, 2014 at 9:56 PM, Srinath Perera <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Look good. If possible we should do this with OpenNLP as it
>>>>>>>>>>>>> has apache licence. However, I could not find NLP regex impl 
>>>>>>>>>>>>> there. Please
>>>>>>>>>>>>> look at it in detial.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --Srinath
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Aug 19, 2014 at 9:52 PM, Malithi Edirisinghe <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are working on a NLP Toolbox improvement in CEP. The main
>>>>>>>>>>>>>> idea of this improvement is to use a NLP library and let user do 
>>>>>>>>>>>>>> some NLP
>>>>>>>>>>>>>> operations as Siddhi extensions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So in our implementation we have decided to support following
>>>>>>>>>>>>>> NLP operations.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *1. findNameEntityType(sentence, entityType)*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Description:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This operation takes a sentence and a predefined entity type
>>>>>>>>>>>>>> as it's inputs. It will return noun(s) in the sentence that 
>>>>>>>>>>>>>> match the
>>>>>>>>>>>>>> defined entity type, as event(s).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *inputs:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sentence  : sentence to be processed
>>>>>>>>>>>>>> entityType: predefined entity type
>>>>>>>>>>>>>>  ORGANIZATION
>>>>>>>>>>>>>> NAME
>>>>>>>>>>>>>>  LOCATION
>>>>>>>>>>>>>>  *output:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> matching noun(s) as event(s)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *example:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  inputs:
>>>>>>>>>>>>>> sentence   : Alice works at WSO2
>>>>>>>>>>>>>>  entityType : NAME
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  output: Alice
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *2. findNLRegexPattern(sentence, regex)*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Description:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This operation takes a sentence and a regular expression as
>>>>>>>>>>>>>> it's inputs. It will return each match in the sentence, as an 
>>>>>>>>>>>>>> event.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *inputs:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sentence  : sentence to be processed
>>>>>>>>>>>>>> regex       : regular expression to be matched
>>>>>>>>>>>>>>  *output:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> matching pharase(s) as event(s)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *example:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> inputs:
>>>>>>>>>>>>>>  sentence   : WSO2 was found in 2005
>>>>>>>>>>>>>>  regex        : \\d{4}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  output: 2005
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *3. findRelationship(sentence, regex)*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Description:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This operation takes a sentence and a regular expression as
>>>>>>>>>>>>>> it's inputs. For each relationship extracted from the regular 
>>>>>>>>>>>>>> expression
>>>>>>>>>>>>>> the operation will return a triplet; subject, object and 
>>>>>>>>>>>>>> relationship as an
>>>>>>>>>>>>>> event.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *inputs:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sentence  : sentence to be processed
>>>>>>>>>>>>>> regex       : regular expression to extract the relationship
>>>>>>>>>>>>>>  *output:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> triplet(s) of (subject, object, relationship) as event(s)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *example:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  inputs:
>>>>>>>>>>>>>> sentence   : Bob works for WSO2
>>>>>>>>>>>>>>  regex        : works for
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  output: (Bob, WSO2, works for)
>>>>>>>>>>>>>>  *4. findNameEntityTypeViaDictionary(sentence, dictionary,
>>>>>>>>>>>>>> entityType)*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Description:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This operation takes a sentence, dictionary file and a
>>>>>>>>>>>>>> predefined entity type as it's inputs. It will return noun(s) in 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> sentence of the defined entity type, that also exists in the 
>>>>>>>>>>>>>> dictionary as
>>>>>>>>>>>>>> event(s).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *inputs:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sentence   : sentence to be processed
>>>>>>>>>>>>>> dictionary  : dictionary of entities of the defined entity
>>>>>>>>>>>>>> type
>>>>>>>>>>>>>> entityType : predefined entity type
>>>>>>>>>>>>>>  ORGANIZATION
>>>>>>>>>>>>>>   NAME
>>>>>>>>>>>>>>  LOCATION
>>>>>>>>>>>>>>  *output:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> matching noun(s) as event(s)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *example:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  inputs:
>>>>>>>>>>>>>> sentence    : Bob works at WSO2
>>>>>>>>>>>>>>  dictionary   : (WSO2,ORACLE,IBM)
>>>>>>>>>>>>>> entityType  : ORGANIZATION
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> output: WSO2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Each NLP operation defined here will be implemented as a
>>>>>>>>>>>>>> transformer extension to Siddhi.
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Malithi Edirisinghe*
>>>>>>>>>>>>>> Senior Software Engineer
>>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mobile : +94 (0) 718176807
>>>>>>>>>>>>>>  [email protected]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> ============================
>>>>>>>>>>>>> Director, Research, WSO2 Inc.
>>>>>>>>>>>>> Visiting Faculty, University of Moratuwa
>>>>>>>>>>>>> Member, Apache Software Foundation
>>>>>>>>>>>>> Research Scientist, Lanka Software Foundation
>>>>>>>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>>>>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>>>>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>>>>>>>>> Phone: 0772360902
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> *S. Suhothayan*
>>>>>>>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>>>>>>>>>>  *WSO2 Inc. *http://wso2.com
>>>>>>>>>>>> * <http://wso2.com/>*
>>>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> |
>>>>>>>>>>>> blog: http://suhothayan.blogspot.com/ 
>>>>>>>>>>>> <http://suhothayan.blogspot.com/>
>>>>>>>>>>>> twitter: http://twitter.com/suhothayan 
>>>>>>>>>>>> <http://twitter.com/suhothayan> |
>>>>>>>>>>>> linked-in: http://lk.linkedin.com/in/suhothayan
>>>>>>>>>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Chanuka Dissanayake
>>>>>>>>>>> *Software Engineer | **WSO2 Inc.*; http://wso2.com
>>>>>>>>>>>
>>>>>>>>>>> Mobile: +94 71 33 63 596
>>>>>>>>>>> Email: [email protected]
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> *S. Suhothayan*
>>>>>>>>>> Technical Lead & Team Lead of WSO2 Complex Event Processor
>>>>>>>>>>  *WSO2 Inc. *http://wso2.com
>>>>>>>>>> * <http://wso2.com/>*
>>>>>>>>>> lean . enterprise . middleware
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
>>>>>>>>>> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> 
>>>>>>>>>> twitter:
>>>>>>>>>> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | 
>>>>>>>>>> linked-in:
>>>>>>>>>> http://lk.linkedin.com/in/suhothayan 
>>>>>>>>>> <http://lk.linkedin.com/in/suhothayan>*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *Malithi Edirisinghe*
>>>>>>>>> Senior Software Engineer
>>>>>>>>> WSO2 Inc.
>>>>>>>>>
>>>>>>>>> Mobile : +94 (0) 718176807
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *Malithi Edirisinghe*
>>>>>>>> Senior Software Engineer
>>>>>>>> WSO2 Inc.
>>>>>>>>
>>>>>>>> Mobile : +94 (0) 718176807
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Malithi Edirisinghe*
>>>>>>> Senior Software Engineer
>>>>>>> WSO2 Inc.
>>>>>>>
>>>>>>> Mobile : +94 (0) 718176807
>>>>>>> [email protected]
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ============================
>>>>>> Director, Research, WSO2 Inc.
>>>>>> Visiting Faculty, University of Moratuwa
>>>>>> Member, Apache Software Foundation
>>>>>> Research Scientist, Lanka Software Foundation
>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>>> Site: http://people.apache.org/~hemapani/
>>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>>> Phone: 0772360902
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ============================
>>>>> Director, Research, WSO2 Inc.
>>>>> Visiting Faculty, University of Moratuwa
>>>>> Member, Apache Software Foundation
>>>>> Research Scientist, Lanka Software Foundation
>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>> Site: http://people.apache.org/~hemapani/
>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>> Phone: 0772360902
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Chanuka Dissanayake
>>>> *Software Engineer | **WSO2 Inc.*; http://wso2.com
>>>>
>>>> Mobile: +94 71 33 63 596
>>>> Email: [email protected]
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Malithi Edirisinghe*
>>> Senior Software Engineer
>>> WSO2 Inc.
>>>
>>> Mobile : +94 (0) 718176807
>>> [email protected]
>>>
>>
>>
>>
>> --
>> Chanuka Dissanayake
>> *Software Engineer | **WSO2 Inc.*; http://wso2.com
>>
>> Mobile: +94 71 33 63 596
>> Email: [email protected]
>>
>
>
>
> --
>
> *S. Suhothayan*
> Technical Lead & Team Lead of WSO2 Complex Event Processor
>  *WSO2 Inc. *http://wso2.com
> * <http://wso2.com/>*
> lean . enterprise . middleware
>
>
> *cell: (+94) 779 756 757 <%28%2B94%29%20779%20756%20757> | blog:
> http://suhothayan.blogspot.com/ <http://suhothayan.blogspot.com/> twitter:
> http://twitter.com/suhothayan <http://twitter.com/suhothayan> | linked-in:
> http://lk.linkedin.com/in/suhothayan <http://lk.linkedin.com/in/suhothayan>*
>



-- 
Chanuka Dissanayake
*Software Engineer | **WSO2 Inc.*; http://wso2.com

Mobile: +94 71 33 63 596
Email: [email protected]
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to