Hi Jayani What Rupert means is that it would be good to have a "RegEx" Enhancement Engine which extracts/creates TextAnnotations based on regular expressions configured in the engine. This way you can configure one engine of this type and provide a regular expression for extract dates and times.
After that, we can take a look at the projects pointed out by Rupert in order to be integrated in Stanbol. Regards On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam < jayaniwithanawa...@gmail.com> wrote: > Thank you Rupert and Anuj for your suggestions. I'm going through the links > you have provided. > > Rupert, > > What did you mean by base-line engine that is directly integrated in > Stanbol with Regex based approach? > > Appreciate if you can further elaborate this. > > > On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler < > rupert.westentha...@gmail.com> wrote: > > > Hi Anuj > > > > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <anujs...@gmail.com> wrote: > > > I second that. Regex will work better w.r.t. the default trained model > of > > > OpenNLP. > > > > Both such projects do look interesting: > > > > > Also, take a look at this extractor- > > https://code.google.com/p/heideltime/ and > > > > As this is GPLv3 you can not directly use it to implement an > > EnhancementEngine that is part of the Stanbol Codebase. Integrating it > > via a RESTful service would be an option. > > > > > Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#! > > > > The same is true for SuTime as all Stanford NLP components are under GPL. > > > > If we want to integrate those projects I suggest to extend the Stanbol > > RESTful NLP protocol [1] and service [2] so that it can represent > > date/time points and ranges. SuTime support could be added to the > > already existing Stanbol-Stanford integration [3]. For HeidelTime one > > would need to implement a similar component. > > > > > > But before integrating those I would prefer to have a base-line engine > > that is directly integrated in Stanbol. Looks like a Regex based > > approach could be sufficient for that. WDYT Jayani? > > > > best > > Rupert > > > > [1] https://issues.apache.org/jira/browse/STANBOL-878 > > [2] https://issues.apache.org/jira/browse/STANBOL-892 > > [3] https://github.com/westei/stanbol-stanfordnlp > > > > > > > > It will be useful to have similar temporal expression enhancement > engine > > in > > > Stanbol. > > > > > > Regards, > > > Anuj > > > > > > > > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler < > > > rupert.westentha...@gmail.com> wrote: > > > > > >> Hi Jayani, > > >> > > >> I was not even aware that there exists a Time model for OpenNLP. > > >> Documentation shows that this uses a purely statistical model so I am > > >> wondering about the quality. Note also that OpenNLP only provides a > > >> prebuilt model for English [1]. > > >> > > >> AFAIK OpenNLP will only provide you with the information that some > > >> tokens do represent a date. It will not provide you the parsed > > >> xsd:dateTime. So if you use this Engine you will still need to > > >> implement this part of your own. So most likely you will end up using > > >> regex patterns to parse the actual time from the Tokens marked by > > >> OpenNLP as time. > > >> > > >> So I am wondering if it is not better to start with Regex from the > > >> beginning. If you search for "Regey Date Time extraction" you can > > >> fined a huge set of example you could start from. > > >> > > >> best > > >> Rupert > > >> > > >> > > >> [1] http://opennlp.sourceforge.net/models-1.5/ > > >> > > >> > > >> > > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam > > >> <jayaniwithanawa...@gmail.com> wrote: > > >> > Hi Dileepa, > > >> > > > >> > Thank you so much for your valuble feedback. I'm working on this. > > >> > > > >> > > > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody < > > >> dileepajayak...@gmail.com > > >> >> wrote: > > >> > > > >> >> Hi Jayani, > > >> >> > > >> >> There are several enhancement engines in Stanbol developed based on > > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1]) > > Each of > > >> >> these engines focus on a particular enhancement aspect using > OpenNLP. > > >> >> Therefore I think it's better to write a new engine for temporal > > >> >> extractions rather than extending the OpenNLP-NER engine. > > >> >> > > >> >> Thanks, > > >> >> Dileepa > > >> >> > > >> >> [1] > > >> >> > > >> > > > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp > > >> >> > > >> >> > > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam < > > >> >> jayaniwithanawa...@gmail.com> wrote: > > >> >> > > >> >> > Hi, > > >> >> > > > >> >> > I'm researching on adding new enhancement engine for extracting > > date > > >> and > > >> >> > time (Temporal extraction) to Stanbol as suggested by Rupert. > > >> >> > > > >> >> > There, it is being found that OpenNLP has an entity extraction > unit > > >> for > > >> >> > date and time. > > >> >> > Also, I noticed that OpenNLP is already integrated to Stanbol in > > NER > > >> >> > engine. > > >> >> > > > >> >> > So, as per my understanding, there are two options to extract > date > > and > > >> >> > time. > > >> >> > > > >> >> > One is to have a seperate enhancement engine for date and time > > >> >> information > > >> >> > extraction. Another one is to add date time extraction as a code > > >> >> > enhancement to exisitng OpenNLP NER engine. > > >> >> > > > >> >> > What is your opinion on this? Is there any other approach which > you > > >> think > > >> >> > that would be better? > > >> >> > > > >> >> > Thank you > > >> >> > Jayani > > >> >> > > > >> >> > > >> > > >> > > >> > > >> -- > > >> | Rupert Westenthaler rupert.westentha...@gmail.com > > >> | Bodenlehenstraße 11 ++43-699-11108907 > > >> | A-5500 Bischofshofen > > >> > > > > > > > > -- > > | Rupert Westenthaler rupert.westentha...@gmail.com > > | Bodenlehenstraße 11 ++43-699-11108907 > > | A-5500 Bischofshofen > > > -- ------------------------------ This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, London W6 7AN.