Thanks Mark. After setting maxDocCharsToAnalyze to a value greater than 0, I can now extract the span terms.
I did noticed a strange issue though. When the query is just a PhraseQuery(e.g. "everlasting glory"), getWeightedSpanTerms() returns all the span terms along with their span positions. But when the query is a BooleanQuery containing phrase and non-phrase terms(e.g. "everlasting glory"+unity), getWeightedSpanTerms() returns all the span terms but the span positions are returned only for the phrase terms(i.e. "everlasting" and "glory"). Span positions for the non-phrase term(i.e. "unity") is empty. Any ideas why this could be happening? -Jahangir On Thu, Jul 7, 2011 at 4:40 AM, Mark Miller <markrmil...@gmail.com> wrote: > Sorry - kind of my fault. When I fixed this to use maxDocCharsToAnalyze, I > didn't set a default other than 0 because I didn't really count on this > being used beyond how it is in the Highlighter - which always sets > maxDocCharsToAnalyze with it's default. > > You've got to explicitly set it higher than 0 for now. > > Feel free to create a JIRA issue and we can give it's own default greater > than 0. > > - Mark Miller > lucidimagination.com > > > On Jul 6, 2011, at 5:34 PM, Jahangir Anwari wrote: > > > I have a CustomHighlighter that extends the SolrHighlighter and overrides > > the doHighlighting() method. Then for each document I am trying to > extract > > the span terms so that later I can use it to get the span Positions. I > tried > > to get the weightedSpanTerms using WeightedSpanTermExtractor but was > > unsuccessful. Below is the code that I am have. Is there something > missing > > that needs to be added to get the span terms? > > > > // in CustomHighlighter.java > > @Override > > public NamedList doHighlighting(DocList docs, Query query, > SolrQueryRequest > > req, String[] defaultFields) throws IOException { > > > > NamedList highlightedSnippets = super.doHighlighting(docs, query, req, > > defaultFields); > > > > IndexReader reader = req.getSearcher().getIndexReader(); > > > > String[] fieldNames = getHighlightFields(query, req, defaultFields); > > for (String fieldName : fieldNames) > > { > > QueryScorer scorer = new QueryScorer(query, null); > > scorer.setExpandMultiTermQuery(true); > > scorer.setMaxDocCharsToAnalyze(51200); > > > > DocIterator iterator = docs.iterator(); > > for (int i = 0; i < docs.size(); i++) > > { > > int docId = iterator.nextDoc(); > > System.out.println("DocId: " + docId); > > TokenStream tokenStream = TokenSources.getTokenStream(reader, docId, > > fieldName); > > WeightedSpanTermExtractor wste = new > WeightedSpanTermExtractor(fieldName); > > wste.setExpandMultiTermQuery(true); > > wste.setWrapIfNotCachingTokenFilter(true); > > > > Map<String,WeightedSpanTerm> weightedSpanTerms = > > wste.getWeightedSpanTerms(query, tokenStream, fieldName); // this is > always > > empty > > System.out.println("weightedSpanTerms: " + weightedSpanTerms.values()); > > > > } > > } > > return highlightedSnippets; > > > > } > > > > Thanks, > > Jahangir > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >