Hi, Tools like GATE (http://www.gate.ac.uk) or Apache UIMA would be good candidates for what you are trying to achieve.
HTH -- DigitalPebble Ltd http://www.digitalpebble.com 2010/1/14 Ortelli, Gian Luca <[email protected]> > > Well, the exact definition we're going to find out empirically, > as we run an implementation through our data and look at the quality > of results... For now, I would use the number of tokens between the > finding ("[email protected]") and the word that gives context ("Contact"). > > Anyway, replying to karl: I'm not searching for a given > email/street/time interval/etc., I need to extract EVERY > email/street/time interval/etc. from the text. The kind of need for > which you suggest a natural language processing tool. > > Gianluca > > -----Original Message----- > From: Erick Erickson [mailto:[email protected]] > Sent: Wednesday, January 13, 2010 6:06 PM > To: [email protected] > Subject: Re: Extracting contact data > > Before answering, how to you measure "proximity"? You can make > Lucene work with locations (there's an example in Lucene In Action) > readily enough though.... > > HTH > Erick > > On Wed, Jan 13, 2010 at 11:39 AM, Ortelli, Gian Luca < > [email protected]> wrote: > > > Hi community, > > > > > > > > I have a general understanding of Lucene concepts, and I'm wondering > if > > it's the right tool for my job: > > > > > > > > - I need to extract data like e.g. time intervals ("8am - 12pm"), > street > > addresses from a set of files. The common issue with this data unit is > > that they contain spaces and are not always definable through regexes. > > > > > > > > - the extraction must take into consideration the "proximity": for > > example, a mail address which is close to the work "Contacts" will > > receive a higher rank, since I'm looking for contact data. > > > > > > > > Do you think I can get any advantage from building a solution on > Lucene? > > > > > > > > Gianluca > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
