Re: Extracting span terms using WeightedSpanTermExtractor

Jahangir Anwari Thu, 07 Jul 2011 14:14:35 -0700

Thanks Mark. After setting maxDocCharsToAnalyze to a value greater than 0, I
can now extract the span terms.


I did noticed a strange issue though. When the query is just a
PhraseQuery(e.g. "everlasting glory"), getWeightedSpanTerms() returns all
the span terms along with their span positions. But when the query is a
BooleanQuery containing phrase and non-phrase terms(e.g. "everlasting
glory"+unity), getWeightedSpanTerms() returns all the span terms but the
span positions are returned only for the phrase terms(i.e. "everlasting" and
"glory"). Span positions for the non-phrase term(i.e. "unity") is empty. Any
ideas why this could be happening?

-Jahangir

On Thu, Jul 7, 2011 at 4:40 AM, Mark Miller <[email protected]> wrote:

> Sorry - kind of my fault. When I fixed this to use maxDocCharsToAnalyze, I
> didn't set a default other than 0 because I didn't really count on this
> being used beyond how it is in the Highlighter - which always sets
> maxDocCharsToAnalyze with it's default.
>
> You've got to explicitly set it higher than 0 for now.
>
> Feel free to create a JIRA issue and we can give it's own default greater
> than 0.
>
> - Mark Miller
> lucidimagination.com
>
>
> On Jul 6, 2011, at 5:34 PM, Jahangir Anwari wrote:
>
> > I have a CustomHighlighter that extends the SolrHighlighter and overrides
> > the doHighlighting() method. Then for each document I am trying to
> extract
> > the span terms so that later I can use it to get the span Positions. I
> tried
> > to get the weightedSpanTerms using WeightedSpanTermExtractor but was
> > unsuccessful. Below is the code that I am have. Is there something
> missing
> > that needs to be added to get the span terms?
> >
> > // in CustomHighlighter.java
> > @Override
> > public NamedList doHighlighting(DocList docs, Query query,
> SolrQueryRequest
> > req, String[] defaultFields) throws IOException {
> >
> >  NamedList highlightedSnippets = super.doHighlighting(docs, query, req,
> > defaultFields);
> >
> >  IndexReader reader = req.getSearcher().getIndexReader();
> >
> >  String[] fieldNames = getHighlightFields(query, req, defaultFields);
> >  for (String fieldName : fieldNames)
> >  {
> >  QueryScorer scorer = new QueryScorer(query, null);
> >  scorer.setExpandMultiTermQuery(true);
> >  scorer.setMaxDocCharsToAnalyze(51200);
> >
> >  DocIterator iterator = docs.iterator();
> >  for (int i = 0; i < docs.size(); i++)
> >  {
> > int docId = iterator.nextDoc();
> > System.out.println("DocId: " + docId);
> > TokenStream tokenStream = TokenSources.getTokenStream(reader, docId,
> > fieldName);
> > WeightedSpanTermExtractor wste = new
> WeightedSpanTermExtractor(fieldName);
> > wste.setExpandMultiTermQuery(true);
> > wste.setWrapIfNotCachingTokenFilter(true);
> >
> > Map<String,WeightedSpanTerm> weightedSpanTerms  =
> > wste.getWeightedSpanTerms(query, tokenStream, fieldName); // this is
> always
> > empty
> > System.out.println("weightedSpanTerms: " + weightedSpanTerms.values());
> >
> >  }
> >  }
> >     return highlightedSnippets;
> >
> > }
> >
> > Thanks,
> > Jahangir
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Extracting span terms using WeightedSpanTermExtractor

Reply via email to