Re: Highlighter withField.Store.NO

Pál Barnabás Tue, 10 Mar 2009 03:27:19 -0700

thx for quick answer,
This solution is not possible for me. I want to index millions of e-mails
with attachments (doc, pdf, etc). The mails and the files are stored
already, saving the text content in a separate cache is not acceptable.
I tried to save the with with Field.Store.COMPRESS option, but the
performance was very low (3x indexing time).


2009/3/9 Ben Martz <[email protected]>

> I use the Highlighter class in a shipping product in which I do not store
> values in the index. Instead I independently load the contents from my own
> cache and pass that to Highlighter.GetBestFragments(). The only
> disadvantage
> is that depending on the size of your contents and the speed of your
> contents cache this can make Highlighting a very expensive operation so pay
> very careful attention to how and when you load your contents data.
>
> On Mon, Mar 9, 2009 at 8:14 AM, Pál Barnabás <[email protected]> wrote:
>
> > Hi,
> > I'm trying to highlight the keyword in the search result.
> > This is my code:
> > ------------------------------------------------------------------
> > string indexdir = @"D:\temp\index_testing";
> >            if (System.IO.Directory.Exists(indexdir))
> >                System.IO.Directory.Delete(indexdir, true);
> >
> >            IndexWriter writer = new IndexWriter(indexdir, new
> > Lucene.Net.Analysis.Standard.StandardAnalyzer(), true);
> >            // demo text
> >            string scontent = "First, we parse the user-entered query
> string
> > indicating that we want to match ...";
> >
> >            for (int i = 0; i < 100; i++)
> >            {
> >                Document doc = new Document();
> >
> >                doc.Add(new Field("ID", i.ToString(), Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> >                doc.Add(new Field("CONTENT", scontent, Field.Store.YES,
> > Field.Index.TOKENIZED));
> >
> >                writer.AddDocument(doc);
> >            }
> >
> >            writer.Close();
> >
> >            IndexReader reader = IndexReader.Open(indexdir);
> >            Searcher searcher = new IndexSearcher(reader);
> >            Analyzer analyzer = new
> > Lucene.Net.Analysis.Standard.StandardAnalyzer();
> >
> >            MultiFieldQueryParser parser = new MultiFieldQueryParser(new
> > string[] { "CONTENT" }, analyzer);
> >
> >            Query query = parser.Parse("indicating");
> >            query = query.Rewrite(reader);
> >            Trace.WriteLine("Searching for: " + query.ToString());
> >
> >            Lucene.Net.Search.Hits hits = searcher.Search(query);
> >
> >            SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b
> > class='term'>", "</b>");
> >
> >            QueryScorer scorer = new QueryScorer(query);
> >
> >            Highlighter highlighter = new Highlighter(formatter, scorer);
> >            highlighter.SetTextFragmenter(new SimpleFragmenter(2000));
> >
> >            for (int i = 0; i < hits.Length(); i++)
> >            {
> >                Document resdoc = hits.Doc(i);
> >
> >                string s = resdoc.Get("CONTENT");
> >                // s is null if Field.Store is NO
> >                TokenStream tsTitle = analyzer.TokenStream("CONTENT", new
> > System.IO.StringReader(s));
> >                string hl = highlighter.GetBestFragment(tsTitle, s);
> >            }
> > ------------------------------------------------------------------
> >
> > The problem is when the content is not stored in the index
> > (Field.Store.NO), the result document does not contain the value. Is
> > it possible to use the
> > Highlighter class in this case ? or what's the best way to highlight the
> > search result? is it possible to get all tokens for the hits.Doc(i)?
> >
>
>
>
> --
> 13:37 - Someone stole the precinct toilet. The cops have nothing to go on.
> 14:37 - Officers dispatched to a daycare where a three-year-old was
> resisting a rest.
> 21:11 - Hole found in nudist camp wall. Officers are looking into it.
>

Re: Highlighter withField.Store.NO

Reply via email to