Re: How to tune Analyzer for Text Extraction

Shai Erera Tue, 11 Aug 2009 20:38:34 -0700

If this file has a predefined construct, e.g.:
title: someting
location: new york
....
then you can write a simple parser that extracts that information.

But I think otherwise this falls outside the scope of Lucene, unless I
misunderstood you.

If I had to give it a long shot though, I'd try to index all the data using
WhitespaceAnalyzer, and then query for "Location". I'd also use the
Highlighter in contrib to find matching segments of text, and take whatever
has come after "Location". You should know though how much to take after
Location ...

Maybe if you post here a sample input, it'll trigger something in me :).

Shai

On Wed, Aug 12, 2009 at 12:27 AM, xs2Abhishek <[email protected]> wrote:

>
> Hi,
>
> I am trying to make a decision on weather or not I can use Lucene for my
> requirements, which mainly include data tagging. I have to be able to parse
> or index a .txt file and then be able to extract text accordingly. For e.g
> if the input document has some text like: "Location: New York" , so for
> this
> input I should be able to extract "New York" if key word Location is
> present. I am trying to learn about Lucene and looked into
> "tokensFromAnalysis(analyzer, text)". But i'm still not sure how I could
> extract data using lucene. Can I use queries to extract this piece of
> information?
>
> Any help on this would be appreciated.
>
> Thanks,
> Abhishek
> --
> View this message in context:
> http://www.nabble.com/How-to-tune-Analyzer-for-Text-Extraction-tp24926082p24926082.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: How to tune Analyzer for Text Extraction

Reply via email to