Hi Jayani, I think [1] has a good list of regex pattern to start from
Note that day/month names are language specific. So If we want to have support for those we would need to create a dictionary and select the right options based on the language detected for the text. best Rupert [1] http://regexlib.com/DisplayPatterns.aspx?cattabindex=4&categoryId=5&AspxAutoDetectCookieSupport=1 On Mon, Jan 27, 2014 at 1:50 PM, Jayani Withanawasam <jayaniwithanawa...@gmail.com> wrote: > Thank you Antonio > > Hi all, > > I have done a bit of research on this task and I need your opinion on > "recognizing" temporal expressions from plain text. > As per my understanding, 3 options are available to perform this task. > > > 1. Statistical approach (E.g., Open NLP) > 2. Rule based approach (linguistic grammar based APIs such as SUTime, > HeidelTime) > 3. Simple regular expressions engine (simple temporal patterns) > > > We already decided we will not proceed with option 1. Also, we will not go > for option 2 as well due to license issue. > > So, with regard to option 3, there are few possible approaches to identify > whether a given expression is a temporal expression. > > Year - numerical expression given as 4 digits of number with in specified > time range (E.g., 1100 - 2500) > Month - Jan, January.., (1-12) > Date - 1-31 > Day - Monday, Tuesday... > Time - a.m., p.m. > > Also, up to some extent we can infer temporal expressions based on the time > related prepositions such as "on, in, at, since etc." > > Do you think the above approach will provide us sufficient results for the > baseline implementation? Or do we need more advanced approach, for example > our own rule engine/grammar for date time extraction? > > > > > > > > > > > > On Mon, Jan 27, 2014 at 1:31 PM, Antonio David Perez Morales < > ape...@zaizi.com> wrote: > >> Hi Jayani >> >> Perfect. I can help you if you want in the implementation of this engine or >> in questions about the classes used in the Enhancement Engine or about >> OSGI. >> >> Feel free to ask. >> >> Regards >> >> >> On Mon, Jan 27, 2014 at 8:13 AM, Jayani Withanawasam < >> jayaniwithanawa...@gmail.com> wrote: >> >> > Thank you Antonio and Rupert for your clarifications. >> > >> > So, we need to work on a date time extraction engine from the scratch >> (with >> > out using any of the mentioned third party libraries) as the base line >> > implementation. >> > >> > We will implement other possible approaches as advanced features later. >> > Correct me if I'm wrong. I'm working on this and will keep posted on the >> > progress. >> > >> > >> > >> > On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler < >> > rupert.westentha...@gmail.com> wrote: >> > >> > > Hi Jayani, Antonio, >> > > >> > > With "base-line" I mean, that it is IMHO important to have a >> > > functionality also present in the default distribution of Stanbol. >> > > With a Regex based solution this is possible. With implementations >> > > based on GPL licensed projects it is not. >> > > >> > > Having a "base-line" implementation would allow to start users with >> > > the Regex based DateExtractionEngine and if this one does not fit the >> > > requirements they would look for alternatives and find advanced >> > > options that would require them do manually download and install >> > > additional GPL licensed software. >> > > >> > > best >> > > Rupert >> > > >> > > >> > > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales >> > > <ape...@zaizi.com> wrote: >> > > > Hi Jayani >> > > > >> > > > What Rupert means is that it would be good to have a "RegEx" >> > Enhancement >> > > > Engine which extracts/creates TextAnnotations based on regular >> > > expressions >> > > > configured in the engine. >> > > > This way you can configure one engine of this type and provide a >> > regular >> > > > expression for extract dates and times. >> > > > >> > > > After that, we can take a look at the projects pointed out by Rupert >> in >> > > > order to be integrated in Stanbol. >> > > > >> > > > Regards >> > > > >> > > > >> > > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam < >> > > > jayaniwithanawa...@gmail.com> wrote: >> > > > >> > > >> Thank you Rupert and Anuj for your suggestions. I'm going through >> the >> > > links >> > > >> you have provided. >> > > >> >> > > >> Rupert, >> > > >> >> > > >> What did you mean by base-line engine that is directly integrated in >> > > >> Stanbol with Regex based approach? >> > > >> >> > > >> Appreciate if you can further elaborate this. >> > > >> >> > > >> >> > > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler < >> > > >> rupert.westentha...@gmail.com> wrote: >> > > >> >> > > >> > Hi Anuj >> > > >> > >> > > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <anujs...@gmail.com> >> > > wrote: >> > > >> > > I second that. Regex will work better w.r.t. the default trained >> > > model >> > > >> of >> > > >> > > OpenNLP. >> > > >> > >> > > >> > Both such projects do look interesting: >> > > >> > >> > > >> > > Also, take a look at this extractor- >> > > >> > https://code.google.com/p/heideltime/ and >> > > >> > >> > > >> > As this is GPLv3 you can not directly use it to implement an >> > > >> > EnhancementEngine that is part of the Stanbol Codebase. >> Integrating >> > it >> > > >> > via a RESTful service would be an option. >> > > >> > >> > > >> > > Stanford's tagger- >> > http://nlp.stanford.edu/downloads/sutime.shtml#! >> > > >> > >> > > >> > The same is true for SuTime as all Stanford NLP components are >> under >> > > GPL. >> > > >> > >> > > >> > If we want to integrate those projects I suggest to extend the >> > Stanbol >> > > >> > RESTful NLP protocol [1] and service [2] so that it can represent >> > > >> > date/time points and ranges. SuTime support could be added to the >> > > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime >> > one >> > > >> > would need to implement a similar component. >> > > >> > >> > > >> > >> > > >> > But before integrating those I would prefer to have a base-line >> > engine >> > > >> > that is directly integrated in Stanbol. Looks like a Regex based >> > > >> > approach could be sufficient for that. WDYT Jayani? >> > > >> > >> > > >> > best >> > > >> > Rupert >> > > >> > >> > > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878 >> > > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892 >> > > >> > [3] https://github.com/westei/stanbol-stanfordnlp >> > > >> > >> > > >> > > >> > > >> > > It will be useful to have similar temporal expression >> enhancement >> > > >> engine >> > > >> > in >> > > >> > > Stanbol. >> > > >> > > >> > > >> > > Regards, >> > > >> > > Anuj >> > > >> > > >> > > >> > > >> > > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler < >> > > >> > > rupert.westentha...@gmail.com> wrote: >> > > >> > > >> > > >> > >> Hi Jayani, >> > > >> > >> >> > > >> > >> I was not even aware that there exists a Time model for >> OpenNLP. >> > > >> > >> Documentation shows that this uses a purely statistical model >> so >> > I >> > > am >> > > >> > >> wondering about the quality. Note also that OpenNLP only >> > provides a >> > > >> > >> prebuilt model for English [1]. >> > > >> > >> >> > > >> > >> AFAIK OpenNLP will only provide you with the information that >> > some >> > > >> > >> tokens do represent a date. It will not provide you the parsed >> > > >> > >> xsd:dateTime. So if you use this Engine you will still need to >> > > >> > >> implement this part of your own. So most likely you will end up >> > > using >> > > >> > >> regex patterns to parse the actual time from the Tokens marked >> by >> > > >> > >> OpenNLP as time. >> > > >> > >> >> > > >> > >> So I am wondering if it is not better to start with Regex from >> > the >> > > >> > >> beginning. If you search for "Regey Date Time extraction" you >> can >> > > >> > >> fined a huge set of example you could start from. >> > > >> > >> >> > > >> > >> best >> > > >> > >> Rupert >> > > >> > >> >> > > >> > >> >> > > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/ >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam >> > > >> > >> <jayaniwithanawa...@gmail.com> wrote: >> > > >> > >> > Hi Dileepa, >> > > >> > >> > >> > > >> > >> > Thank you so much for your valuble feedback. I'm working on >> > this. >> > > >> > >> > >> > > >> > >> > >> > > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody < >> > > >> > >> dileepajayak...@gmail.com >> > > >> > >> >> wrote: >> > > >> > >> > >> > > >> > >> >> Hi Jayani, >> > > >> > >> >> >> > > >> > >> >> There are several enhancement engines in Stanbol developed >> > > based on >> > > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See >> > [1]) >> > > >> > Each of >> > > >> > >> >> these engines focus on a particular enhancement aspect using >> > > >> OpenNLP. >> > > >> > >> >> Therefore I think it's better to write a new engine for >> > temporal >> > > >> > >> >> extractions rather than extending the OpenNLP-NER engine. >> > > >> > >> >> >> > > >> > >> >> Thanks, >> > > >> > >> >> Dileepa >> > > >> > >> >> >> > > >> > >> >> [1] >> > > >> > >> >> >> > > >> > >> >> > > >> > >> > > >> >> > > >> > >> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp >> > > >> > >> >> >> > > >> > >> >> >> > > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam < >> > > >> > >> >> jayaniwithanawa...@gmail.com> wrote: >> > > >> > >> >> >> > > >> > >> >> > Hi, >> > > >> > >> >> > >> > > >> > >> >> > I'm researching on adding new enhancement engine for >> > > extracting >> > > >> > date >> > > >> > >> and >> > > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by >> > Rupert. >> > > >> > >> >> > >> > > >> > >> >> > There, it is being found that OpenNLP has an entity >> > extraction >> > > >> unit >> > > >> > >> for >> > > >> > >> >> > date and time. >> > > >> > >> >> > Also, I noticed that OpenNLP is already integrated to >> > Stanbol >> > > in >> > > >> > NER >> > > >> > >> >> > engine. >> > > >> > >> >> > >> > > >> > >> >> > So, as per my understanding, there are two options to >> > extract >> > > >> date >> > > >> > and >> > > >> > >> >> > time. >> > > >> > >> >> > >> > > >> > >> >> > One is to have a seperate enhancement engine for date and >> > time >> > > >> > >> >> information >> > > >> > >> >> > extraction. Another one is to add date time extraction as >> a >> > > code >> > > >> > >> >> > enhancement to exisitng OpenNLP NER engine. >> > > >> > >> >> > >> > > >> > >> >> > What is your opinion on this? Is there any other approach >> > > which >> > > >> you >> > > >> > >> think >> > > >> > >> >> > that would be better? >> > > >> > >> >> > >> > > >> > >> >> > Thank you >> > > >> > >> >> > Jayani >> > > >> > >> >> > >> > > >> > >> >> >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> -- >> > > >> > >> | Rupert Westenthaler >> rupert.westentha...@gmail.com >> > > >> > >> | Bodenlehenstraße 11 >> > > ++43-699-11108907 >> > > >> > >> | A-5500 Bischofshofen >> > > >> > >> >> > > >> > >> > > >> > >> > > >> > >> > > >> > -- >> > > >> > | Rupert Westenthaler rupert.westentha...@gmail.com >> > > >> > | Bodenlehenstraße 11 >> ++43-699-11108907 >> > > >> > | A-5500 Bischofshofen >> > > >> > >> > > >> >> > > > >> > > > -- >> > > > >> > > > ------------------------------ >> > > > This message should be regarded as confidential. If you have received >> > > this >> > > > email in error please notify the sender and destroy it immediately. >> > > > Statements of intent shall only become binding when confirmed in hard >> > > copy >> > > > by an authorised signatory. >> > > > >> > > > Zaizi Ltd is registered in England and Wales with the registration >> > number >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush >> Road, >> > > > London W6 7AN. >> > > >> > > >> > > >> > > -- >> > > | Rupert Westenthaler rupert.westentha...@gmail.com >> > > | Bodenlehenstraße 11 ++43-699-11108907 >> > > | A-5500 Bischofshofen >> > > >> > >> >> -- >> >> ------------------------------ >> This message should be regarded as confidential. If you have received this >> email in error please notify the sender and destroy it immediately. >> Statements of intent shall only become binding when confirmed in hard copy >> by an authorised signatory. >> >> Zaizi Ltd is registered in England and Wales with the registration number >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, >> London W6 7AN. >> -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen