On Fri, Mar 7, 2014 at 1:45 PM, Mark G <[email protected]> wrote:

> Hello all, I would like to propose the development of a Temporal Extraction
> addon. In the industry I work in, there is a need to support search of
> documents/entities by location and date mentions within the document text.
> I feel pretty good about the GeoEntityLinker addon for providing geocoding,
> but now I need to do date extraction.
>
> This addon I propose would take text, and return a real java.util.Date,
> with a precision, likely stored in an extended Span object. Initially, I
> would like it to deal with year, seasonal, month, and day level references,
> and return a real Date and a precision. Don't care so much about days of
> week mentions and such, this is geared more towards supporting search and
> other datetime related analytics.
>
> I have done this before to some degree a while back, and I have done
> research that leads to a couple different approaches:
> 1. All regex based extraction, and then a series of rules for cleaning the
> results.
> pros: no training, simple configuration, predictable output
> cons: regexes are confusing as they mature, regexes are not context
> specific
> 2. Machine learning (like the current opennlp model/NER can do pretty well)
> pros: based on user data (if trained on it), adaptive etc
> cons:unpredictable strings as a result, hard to deal with.
> 3. A combination of Regex extraction and ML, in which the regex results are
> highly specific and used for sentence annotation for building a model.
> pros: model based on regex results on user data, adaptive, more recall than
> option 1, more predicatble results than option 2
> cons:laborious processing (run regex extraction , produce annotations,
> build a model etc), still deal with unpredictable results
>
> My recommendation is option 3. I would like to write a regex based
> extractor that stands alone, but also write an impl for the
> modelbuilder-addon that would use the regex based extractor to create
> annotations for the model building process that occurs in the
> modelbuilder-addon (which automates annotation and model building based on
> user defined "known entities" and sentences). Option three would also
> provide "simple" and "advanced" versions of temporal extraction.
>
> this is a complex process, let us know if you see utility in this, and
> please provide any insights.
>
> sorry for the long email
>
> thanks
> Mark G
>



-- 

Adriano Araújo Santos


*"A mente que se abre a uma nova idéia jamais voltará ao seu tamanho
original."*

Reply via email to