thx for quick answer, This solution is not possible for me. I want to index millions of e-mails with attachments (doc, pdf, etc). The mails and the files are stored already, saving the text content in a separate cache is not acceptable. I tried to save the with with Field.Store.COMPRESS option, but the performance was very low (3x indexing time).
2009/3/9 Ben Martz <[email protected]> > I use the Highlighter class in a shipping product in which I do not store > values in the index. Instead I independently load the contents from my own > cache and pass that to Highlighter.GetBestFragments(). The only > disadvantage > is that depending on the size of your contents and the speed of your > contents cache this can make Highlighting a very expensive operation so pay > very careful attention to how and when you load your contents data. > > On Mon, Mar 9, 2009 at 8:14 AM, Pál Barnabás <[email protected]> wrote: > > > Hi, > > I'm trying to highlight the keyword in the search result. > > This is my code: > > ------------------------------------------------------------------ > > string indexdir = @"D:\temp\index_testing"; > > if (System.IO.Directory.Exists(indexdir)) > > System.IO.Directory.Delete(indexdir, true); > > > > IndexWriter writer = new IndexWriter(indexdir, new > > Lucene.Net.Analysis.Standard.StandardAnalyzer(), true); > > // demo text > > string scontent = "First, we parse the user-entered query > string > > indicating that we want to match ..."; > > > > for (int i = 0; i < 100; i++) > > { > > Document doc = new Document(); > > > > doc.Add(new Field("ID", i.ToString(), Field.Store.YES, > > Field.Index.UN_TOKENIZED)); > > doc.Add(new Field("CONTENT", scontent, Field.Store.YES, > > Field.Index.TOKENIZED)); > > > > writer.AddDocument(doc); > > } > > > > writer.Close(); > > > > IndexReader reader = IndexReader.Open(indexdir); > > Searcher searcher = new IndexSearcher(reader); > > Analyzer analyzer = new > > Lucene.Net.Analysis.Standard.StandardAnalyzer(); > > > > MultiFieldQueryParser parser = new MultiFieldQueryParser(new > > string[] { "CONTENT" }, analyzer); > > > > Query query = parser.Parse("indicating"); > > query = query.Rewrite(reader); > > Trace.WriteLine("Searching for: " + query.ToString()); > > > > Lucene.Net.Search.Hits hits = searcher.Search(query); > > > > SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b > > class='term'>", "</b>"); > > > > QueryScorer scorer = new QueryScorer(query); > > > > Highlighter highlighter = new Highlighter(formatter, scorer); > > highlighter.SetTextFragmenter(new SimpleFragmenter(2000)); > > > > for (int i = 0; i < hits.Length(); i++) > > { > > Document resdoc = hits.Doc(i); > > > > string s = resdoc.Get("CONTENT"); > > // s is null if Field.Store is NO > > TokenStream tsTitle = analyzer.TokenStream("CONTENT", new > > System.IO.StringReader(s)); > > string hl = highlighter.GetBestFragment(tsTitle, s); > > } > > ------------------------------------------------------------------ > > > > The problem is when the content is not stored in the index > > (Field.Store.NO), the result document does not contain the value. Is > > it possible to use the > > Highlighter class in this case ? or what's the best way to highlight the > > search result? is it possible to get all tokens for the hits.Doc(i)? > > > > > > -- > 13:37 - Someone stole the precinct toilet. The cops have nothing to go on. > 14:37 - Officers dispatched to a daycare where a three-year-old was > resisting a rest. > 21:11 - Hole found in nudist camp wall. Officers are looking into it. >
