RE: Extracting contact data

Ortelli, Gian Luca Thu, 14 Jan 2010 02:04:57 -0800

Well, the exact definition we're going to find out empirically, 
as we run an implementation through our data and look at the quality 
of results... For now, I would use the number of tokens between the
finding ("a...@def.com") and the word that gives context ("Contact").

Anyway, replying to karl: I'm not searching for a given
email/street/time interval/etc., I need to extract EVERY
email/street/time interval/etc. from the text. The kind of need for
which you suggest a natural language processing tool.

  Gianluca

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, January 13, 2010 6:06 PM
To: java-user@lucene.apache.org
Subject: Re: Extracting contact data

Before answering, how to you measure "proximity"? You can make
Lucene work with locations (there's an example in Lucene In Action)
readily enough though....

HTH
Erick

On Wed, Jan 13, 2010 at 11:39 AM, Ortelli, Gian Luca <
gianluca.orte...@truvo.com> wrote:

> Hi community,
>
>
>
> I have a general understanding of Lucene concepts, and I'm wondering
if
> it's the right tool for my job:
>
>
>
> - I need to extract data like e.g. time intervals ("8am - 12pm"),
street
> addresses from a set of files. The common issue with this data unit is
> that they contain spaces and are not always definable through regexes.
>
>
>
> - the extraction must take into consideration the "proximity": for
> example, a mail address which is close to the work "Contacts" will
> receive a higher rank, since I'm looking for contact data.
>
>
>
> Do you think I can get any advantage from building a solution on
Lucene?
>
>
>
>  Gianluca
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Extracting contact data

Reply via email to