Thank you Antonio Hi all,
I have done a bit of research on this task and I need your opinion on "recognizing" temporal expressions from plain text. As per my understanding, 3 options are available to perform this task. 1. Statistical approach (E.g., Open NLP) 2. Rule based approach (linguistic grammar based APIs such as SUTime, HeidelTime) 3. Simple regular expressions engine (simple temporal patterns) We already decided we will not proceed with option 1. Also, we will not go for option 2 as well due to license issue. So, with regard to option 3, there are few possible approaches to identify whether a given expression is a temporal expression. Year - numerical expression given as 4 digits of number with in specified time range (E.g., 1100 - 2500) Month - Jan, January.., (1-12) Date - 1-31 Day - Monday, Tuesday... Time - a.m., p.m. Also, up to some extent we can infer temporal expressions based on the time related prepositions such as "on, in, at, since etc." Do you think the above approach will provide us sufficient results for the baseline implementation? Or do we need more advanced approach, for example our own rule engine/grammar for date time extraction? On Mon, Jan 27, 2014 at 1:31 PM, Antonio David Perez Morales < ape...@zaizi.com> wrote: > Hi Jayani > > Perfect. I can help you if you want in the implementation of this engine or > in questions about the classes used in the Enhancement Engine or about > OSGI. > > Feel free to ask. > > Regards > > > On Mon, Jan 27, 2014 at 8:13 AM, Jayani Withanawasam < > jayaniwithanawa...@gmail.com> wrote: > > > Thank you Antonio and Rupert for your clarifications. > > > > So, we need to work on a date time extraction engine from the scratch > (with > > out using any of the mentioned third party libraries) as the base line > > implementation. > > > > We will implement other possible approaches as advanced features later. > > Correct me if I'm wrong. I'm working on this and will keep posted on the > > progress. > > > > > > > > On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler < > > rupert.westentha...@gmail.com> wrote: > > > > > Hi Jayani, Antonio, > > > > > > With "base-line" I mean, that it is IMHO important to have a > > > functionality also present in the default distribution of Stanbol. > > > With a Regex based solution this is possible. With implementations > > > based on GPL licensed projects it is not. > > > > > > Having a "base-line" implementation would allow to start users with > > > the Regex based DateExtractionEngine and if this one does not fit the > > > requirements they would look for alternatives and find advanced > > > options that would require them do manually download and install > > > additional GPL licensed software. > > > > > > best > > > Rupert > > > > > > > > > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales > > > <ape...@zaizi.com> wrote: > > > > Hi Jayani > > > > > > > > What Rupert means is that it would be good to have a "RegEx" > > Enhancement > > > > Engine which extracts/creates TextAnnotations based on regular > > > expressions > > > > configured in the engine. > > > > This way you can configure one engine of this type and provide a > > regular > > > > expression for extract dates and times. > > > > > > > > After that, we can take a look at the projects pointed out by Rupert > in > > > > order to be integrated in Stanbol. > > > > > > > > Regards > > > > > > > > > > > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam < > > > > jayaniwithanawa...@gmail.com> wrote: > > > > > > > >> Thank you Rupert and Anuj for your suggestions. I'm going through > the > > > links > > > >> you have provided. > > > >> > > > >> Rupert, > > > >> > > > >> What did you mean by base-line engine that is directly integrated in > > > >> Stanbol with Regex based approach? > > > >> > > > >> Appreciate if you can further elaborate this. > > > >> > > > >> > > > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler < > > > >> rupert.westentha...@gmail.com> wrote: > > > >> > > > >> > Hi Anuj > > > >> > > > > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <anujs...@gmail.com> > > > wrote: > > > >> > > I second that. Regex will work better w.r.t. the default trained > > > model > > > >> of > > > >> > > OpenNLP. > > > >> > > > > >> > Both such projects do look interesting: > > > >> > > > > >> > > Also, take a look at this extractor- > > > >> > https://code.google.com/p/heideltime/ and > > > >> > > > > >> > As this is GPLv3 you can not directly use it to implement an > > > >> > EnhancementEngine that is part of the Stanbol Codebase. > Integrating > > it > > > >> > via a RESTful service would be an option. > > > >> > > > > >> > > Stanford's tagger- > > http://nlp.stanford.edu/downloads/sutime.shtml#! > > > >> > > > > >> > The same is true for SuTime as all Stanford NLP components are > under > > > GPL. > > > >> > > > > >> > If we want to integrate those projects I suggest to extend the > > Stanbol > > > >> > RESTful NLP protocol [1] and service [2] so that it can represent > > > >> > date/time points and ranges. SuTime support could be added to the > > > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime > > one > > > >> > would need to implement a similar component. > > > >> > > > > >> > > > > >> > But before integrating those I would prefer to have a base-line > > engine > > > >> > that is directly integrated in Stanbol. Looks like a Regex based > > > >> > approach could be sufficient for that. WDYT Jayani? > > > >> > > > > >> > best > > > >> > Rupert > > > >> > > > > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878 > > > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892 > > > >> > [3] https://github.com/westei/stanbol-stanfordnlp > > > >> > > > > >> > > > > > >> > > It will be useful to have similar temporal expression > enhancement > > > >> engine > > > >> > in > > > >> > > Stanbol. > > > >> > > > > > >> > > Regards, > > > >> > > Anuj > > > >> > > > > > >> > > > > > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler < > > > >> > > rupert.westentha...@gmail.com> wrote: > > > >> > > > > > >> > >> Hi Jayani, > > > >> > >> > > > >> > >> I was not even aware that there exists a Time model for > OpenNLP. > > > >> > >> Documentation shows that this uses a purely statistical model > so > > I > > > am > > > >> > >> wondering about the quality. Note also that OpenNLP only > > provides a > > > >> > >> prebuilt model for English [1]. > > > >> > >> > > > >> > >> AFAIK OpenNLP will only provide you with the information that > > some > > > >> > >> tokens do represent a date. It will not provide you the parsed > > > >> > >> xsd:dateTime. So if you use this Engine you will still need to > > > >> > >> implement this part of your own. So most likely you will end up > > > using > > > >> > >> regex patterns to parse the actual time from the Tokens marked > by > > > >> > >> OpenNLP as time. > > > >> > >> > > > >> > >> So I am wondering if it is not better to start with Regex from > > the > > > >> > >> beginning. If you search for "Regey Date Time extraction" you > can > > > >> > >> fined a huge set of example you could start from. > > > >> > >> > > > >> > >> best > > > >> > >> Rupert > > > >> > >> > > > >> > >> > > > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/ > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam > > > >> > >> <jayaniwithanawa...@gmail.com> wrote: > > > >> > >> > Hi Dileepa, > > > >> > >> > > > > >> > >> > Thank you so much for your valuble feedback. I'm working on > > this. > > > >> > >> > > > > >> > >> > > > > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody < > > > >> > >> dileepajayak...@gmail.com > > > >> > >> >> wrote: > > > >> > >> > > > > >> > >> >> Hi Jayani, > > > >> > >> >> > > > >> > >> >> There are several enhancement engines in Stanbol developed > > > based on > > > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See > > [1]) > > > >> > Each of > > > >> > >> >> these engines focus on a particular enhancement aspect using > > > >> OpenNLP. > > > >> > >> >> Therefore I think it's better to write a new engine for > > temporal > > > >> > >> >> extractions rather than extending the OpenNLP-NER engine. > > > >> > >> >> > > > >> > >> >> Thanks, > > > >> > >> >> Dileepa > > > >> > >> >> > > > >> > >> >> [1] > > > >> > >> >> > > > >> > >> > > > >> > > > > >> > > > > > > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp > > > >> > >> >> > > > >> > >> >> > > > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam < > > > >> > >> >> jayaniwithanawa...@gmail.com> wrote: > > > >> > >> >> > > > >> > >> >> > Hi, > > > >> > >> >> > > > > >> > >> >> > I'm researching on adding new enhancement engine for > > > extracting > > > >> > date > > > >> > >> and > > > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by > > Rupert. > > > >> > >> >> > > > > >> > >> >> > There, it is being found that OpenNLP has an entity > > extraction > > > >> unit > > > >> > >> for > > > >> > >> >> > date and time. > > > >> > >> >> > Also, I noticed that OpenNLP is already integrated to > > Stanbol > > > in > > > >> > NER > > > >> > >> >> > engine. > > > >> > >> >> > > > > >> > >> >> > So, as per my understanding, there are two options to > > extract > > > >> date > > > >> > and > > > >> > >> >> > time. > > > >> > >> >> > > > > >> > >> >> > One is to have a seperate enhancement engine for date and > > time > > > >> > >> >> information > > > >> > >> >> > extraction. Another one is to add date time extraction as > a > > > code > > > >> > >> >> > enhancement to exisitng OpenNLP NER engine. > > > >> > >> >> > > > > >> > >> >> > What is your opinion on this? Is there any other approach > > > which > > > >> you > > > >> > >> think > > > >> > >> >> > that would be better? > > > >> > >> >> > > > > >> > >> >> > Thank you > > > >> > >> >> > Jayani > > > >> > >> >> > > > > >> > >> >> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > >> -- > > > >> > >> | Rupert Westenthaler > rupert.westentha...@gmail.com > > > >> > >> | Bodenlehenstraße 11 > > > ++43-699-11108907 > > > >> > >> | A-5500 Bischofshofen > > > >> > >> > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > | Rupert Westenthaler rupert.westentha...@gmail.com > > > >> > | Bodenlehenstraße 11 > ++43-699-11108907 > > > >> > | A-5500 Bischofshofen > > > >> > > > > >> > > > > > > > > -- > > > > > > > > ------------------------------ > > > > This message should be regarded as confidential. If you have received > > > this > > > > email in error please notify the sender and destroy it immediately. > > > > Statements of intent shall only become binding when confirmed in hard > > > copy > > > > by an authorised signatory. > > > > > > > > Zaizi Ltd is registered in England and Wales with the registration > > number > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush > Road, > > > > London W6 7AN. > > > > > > > > > > > > -- > > > | Rupert Westenthaler rupert.westentha...@gmail.com > > > | Bodenlehenstraße 11 ++43-699-11108907 > > > | A-5500 Bischofshofen > > > > > > > -- > > ------------------------------ > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy > by an authorised signatory. > > Zaizi Ltd is registered in England and Wales with the registration number > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > London W6 7AN. >