Hi Jayani,

I think [1] has a good list of regex pattern to start from

Note that day/month names are language specific. So If we want to have
support for those we would need to create a dictionary and select the
right options based on the language detected for the text.

best
Rupert

[1] 
http://regexlib.com/DisplayPatterns.aspx?cattabindex=4&categoryId=5&AspxAutoDetectCookieSupport=1

On Mon, Jan 27, 2014 at 1:50 PM, Jayani Withanawasam
<jayaniwithanawa...@gmail.com> wrote:
> Thank you Antonio
>
> Hi all,
>
> I have done a bit of research on this task and I need your opinion on
> "recognizing" temporal expressions from plain text.
> As per my understanding, 3 options are available to perform this task.
>
>
>    1. Statistical approach (E.g., Open NLP)
>    2. Rule based approach (linguistic grammar based APIs such as SUTime,
>    HeidelTime)
>    3. Simple regular expressions engine (simple temporal patterns)
>
>
> We already decided we will not proceed with option 1. Also, we will not go
> for option 2 as well due to license issue.
>
> So, with regard to option 3, there are few possible approaches to identify
> whether a given expression is a temporal expression.
>
> Year - numerical expression given as 4 digits of number with in specified
> time range (E.g., 1100 - 2500)
> Month - Jan, January.., (1-12)
> Date - 1-31
> Day - Monday, Tuesday...
> Time - a.m., p.m.
>
> Also, up to some extent we can infer temporal expressions based on the time
> related prepositions such as "on, in, at, since etc."
>
> Do you think the above approach will provide us sufficient results for the
> baseline implementation? Or do we need more advanced approach, for example
> our own rule engine/grammar for date time extraction?
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Jan 27, 2014 at 1:31 PM, Antonio David Perez Morales <
> ape...@zaizi.com> wrote:
>
>> Hi Jayani
>>
>> Perfect. I can help you if you want in the implementation of this engine or
>> in questions about the classes used in the Enhancement Engine or about
>> OSGI.
>>
>> Feel free to ask.
>>
>> Regards
>>
>>
>> On Mon, Jan 27, 2014 at 8:13 AM, Jayani Withanawasam <
>> jayaniwithanawa...@gmail.com> wrote:
>>
>> > Thank you Antonio and Rupert for your clarifications.
>> >
>> > So, we need to work on a date time extraction engine from the scratch
>> (with
>> > out using any of the mentioned third party libraries) as the base line
>> > implementation.
>> >
>> > We will implement other possible approaches as advanced features later.
>> > Correct me if I'm wrong. I'm working on this and will keep posted on the
>> > progress.
>> >
>> >
>> >
>> > On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler <
>> > rupert.westentha...@gmail.com> wrote:
>> >
>> > > Hi Jayani, Antonio,
>> > >
>> > > With "base-line" I mean, that it is IMHO important to have a
>> > > functionality also present in the default distribution of Stanbol.
>> > > With a Regex based solution this is possible. With implementations
>> > > based on GPL licensed projects it is not.
>> > >
>> > > Having a "base-line" implementation would allow to start users with
>> > > the Regex based DateExtractionEngine and if this one does not fit the
>> > > requirements they would look for alternatives and find advanced
>> > > options that would require them do manually download and install
>> > > additional GPL licensed software.
>> > >
>> > > best
>> > > Rupert
>> > >
>> > >
>> > > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales
>> > > <ape...@zaizi.com> wrote:
>> > > > Hi Jayani
>> > > >
>> > > > What Rupert means is that it would be good to have a "RegEx"
>> > Enhancement
>> > > > Engine which extracts/creates TextAnnotations based on regular
>> > > expressions
>> > > > configured in the engine.
>> > > > This way you can configure one engine of this type and provide a
>> > regular
>> > > > expression for extract dates and times.
>> > > >
>> > > > After that, we can take a look at the projects pointed out by Rupert
>> in
>> > > > order to be integrated in Stanbol.
>> > > >
>> > > > Regards
>> > > >
>> > > >
>> > > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam <
>> > > > jayaniwithanawa...@gmail.com> wrote:
>> > > >
>> > > >> Thank you Rupert and Anuj for your suggestions. I'm going through
>> the
>> > > links
>> > > >> you have provided.
>> > > >>
>> > > >> Rupert,
>> > > >>
>> > > >> What did you mean by base-line engine that is directly integrated in
>> > > >> Stanbol with Regex based approach?
>> > > >>
>> > > >> Appreciate if you can further elaborate this.
>> > > >>
>> > > >>
>> > > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler <
>> > > >> rupert.westentha...@gmail.com> wrote:
>> > > >>
>> > > >> > Hi Anuj
>> > > >> >
>> > > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <anujs...@gmail.com>
>> > > wrote:
>> > > >> > > I second that. Regex will work better w.r.t. the default trained
>> > > model
>> > > >> of
>> > > >> > > OpenNLP.
>> > > >> >
>> > > >> > Both such projects do look interesting:
>> > > >> >
>> > > >> > > Also, take a look at this extractor-
>> > > >> > https://code.google.com/p/heideltime/ and
>> > > >> >
>> > > >> > As this is GPLv3 you can not directly use it to implement an
>> > > >> > EnhancementEngine that is part of the Stanbol Codebase.
>> Integrating
>> > it
>> > > >> > via a RESTful service would be an option.
>> > > >> >
>> > > >> > > Stanford's tagger-
>> > http://nlp.stanford.edu/downloads/sutime.shtml#!
>> > > >> >
>> > > >> > The same is true for SuTime as all Stanford NLP components are
>> under
>> > > GPL.
>> > > >> >
>> > > >> > If we want to integrate those projects I suggest to extend the
>> > Stanbol
>> > > >> > RESTful NLP protocol [1] and service [2] so that it can represent
>> > > >> > date/time points and ranges. SuTime support could be added to the
>> > > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime
>> > one
>> > > >> > would need to implement a similar component.
>> > > >> >
>> > > >> >
>> > > >> > But before integrating those I would prefer to have a base-line
>> > engine
>> > > >> > that is directly integrated in Stanbol. Looks like a Regex based
>> > > >> > approach could be sufficient for that. WDYT Jayani?
>> > > >> >
>> > > >> > best
>> > > >> > Rupert
>> > > >> >
>> > > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878
>> > > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892
>> > > >> > [3] https://github.com/westei/stanbol-stanfordnlp
>> > > >> >
>> > > >> > >
>> > > >> > > It will be useful to have similar temporal expression
>> enhancement
>> > > >> engine
>> > > >> > in
>> > > >> > > Stanbol.
>> > > >> > >
>> > > >> > > Regards,
>> > > >> > > Anuj
>> > > >> > >
>> > > >> > >
>> > > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler <
>> > > >> > > rupert.westentha...@gmail.com> wrote:
>> > > >> > >
>> > > >> > >> Hi Jayani,
>> > > >> > >>
>> > > >> > >> I was not even aware that there exists a Time model for
>> OpenNLP.
>> > > >> > >> Documentation shows that this uses a purely statistical model
>> so
>> > I
>> > > am
>> > > >> > >> wondering about the quality. Note also that OpenNLP only
>> > provides a
>> > > >> > >> prebuilt model for English [1].
>> > > >> > >>
>> > > >> > >> AFAIK OpenNLP will only provide you with the information that
>> > some
>> > > >> > >> tokens do represent a date. It will not provide you the parsed
>> > > >> > >> xsd:dateTime. So if you use this Engine you will still need to
>> > > >> > >> implement this part of your own. So most likely you will end up
>> > > using
>> > > >> > >> regex patterns to parse the actual time from the Tokens marked
>> by
>> > > >> > >> OpenNLP as time.
>> > > >> > >>
>> > > >> > >> So I am wondering if it is not better to start with Regex from
>> > the
>> > > >> > >> beginning. If you search for "Regey Date Time extraction" you
>> can
>> > > >> > >> fined a huge set of example you could start from.
>> > > >> > >>
>> > > >> > >> best
>> > > >> > >> Rupert
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam
>> > > >> > >> <jayaniwithanawa...@gmail.com> wrote:
>> > > >> > >> > Hi Dileepa,
>> > > >> > >> >
>> > > >> > >> > Thank you so much for your valuble feedback. I'm working on
>> > this.
>> > > >> > >> >
>> > > >> > >> >
>> > > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody <
>> > > >> > >> dileepajayak...@gmail.com
>> > > >> > >> >> wrote:
>> > > >> > >> >
>> > > >> > >> >> Hi Jayani,
>> > > >> > >> >>
>> > > >> > >> >> There are several enhancement engines in Stanbol developed
>> > > based on
>> > > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See
>> > [1])
>> > > >> >  Each of
>> > > >> > >> >> these engines focus on a particular enhancement aspect using
>> > > >> OpenNLP.
>> > > >> > >> >> Therefore I think it's better to write a new engine for
>> > temporal
>> > > >> > >> >> extractions rather than extending the OpenNLP-NER engine.
>> > > >> > >> >>
>> > > >> > >> >> Thanks,
>> > > >> > >> >> Dileepa
>> > > >> > >> >>
>> > > >> > >> >> [1]
>> > > >> > >> >>
>> > > >> > >>
>> > > >> >
>> > > >>
>> > >
>> >
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp
>> > > >> > >> >>
>> > > >> > >> >>
>> > > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam <
>> > > >> > >> >> jayaniwithanawa...@gmail.com> wrote:
>> > > >> > >> >>
>> > > >> > >> >> > Hi,
>> > > >> > >> >> >
>> > > >> > >> >> > I'm researching on adding new enhancement engine for
>> > > extracting
>> > > >> > date
>> > > >> > >> and
>> > > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by
>> > Rupert.
>> > > >> > >> >> >
>> > > >> > >> >> > There, it is being found that OpenNLP has an entity
>> > extraction
>> > > >> unit
>> > > >> > >> for
>> > > >> > >> >> > date and time.
>> > > >> > >> >> > Also, I noticed that OpenNLP is already integrated to
>> > Stanbol
>> > > in
>> > > >> > NER
>> > > >> > >> >> > engine.
>> > > >> > >> >> >
>> > > >> > >> >> > So, as per my understanding, there are two options to
>> > extract
>> > > >> date
>> > > >> > and
>> > > >> > >> >> > time.
>> > > >> > >> >> >
>> > > >> > >> >> > One is to have a seperate enhancement engine for date and
>> > time
>> > > >> > >> >> information
>> > > >> > >> >> > extraction. Another one is to add date time extraction as
>> a
>> > > code
>> > > >> > >> >> > enhancement to exisitng OpenNLP NER engine.
>> > > >> > >> >> >
>> > > >> > >> >> > What is your opinion on this? Is there any other approach
>> > > which
>> > > >> you
>> > > >> > >> think
>> > > >> > >> >> > that would be better?
>> > > >> > >> >> >
>> > > >> > >> >> > Thank you
>> > > >> > >> >> > Jayani
>> > > >> > >> >> >
>> > > >> > >> >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> --
>> > > >> > >> | Rupert Westenthaler
>> rupert.westentha...@gmail.com
>> > > >> > >> | Bodenlehenstraße 11
>> > > ++43-699-11108907
>> > > >> > >> | A-5500 Bischofshofen
>> > > >> > >>
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > --
>> > > >> > | Rupert Westenthaler             rupert.westentha...@gmail.com
>> > > >> > | Bodenlehenstraße 11
>> ++43-699-11108907
>> > > >> > | A-5500 Bischofshofen
>> > > >> >
>> > > >>
>> > > >
>> > > > --
>> > > >
>> > > > ------------------------------
>> > > > This message should be regarded as confidential. If you have received
>> > > this
>> > > > email in error please notify the sender and destroy it immediately.
>> > > > Statements of intent shall only become binding when confirmed in hard
>> > > copy
>> > > > by an authorised signatory.
>> > > >
>> > > > Zaizi Ltd is registered in England and Wales with the registration
>> > number
>> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> Road,
>> > > > London W6 7AN.
>> > >
>> > >
>> > >
>> > > --
>> > > | Rupert Westenthaler             rupert.westentha...@gmail.com
>> > > | Bodenlehenstraße 11                             ++43-699-11108907
>> > > | A-5500 Bischofshofen
>> > >
>> >
>>
>> --
>>
>> ------------------------------
>> This message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard copy
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number
>> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
>> London W6 7AN.
>>



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Reply via email to