The problem is that I need to be able to match spans resulting from a a
SpanNearQuery with the Term they came from so I can eliminate using Payloads
from certain Terms on a query-by-query basis.
I still need this term to effect the results of a NearSpanQuery as per the
usual logic, I just need to know when iterating over the resulting Spans
that when I hit one originating from a certain Term not to load it's payload
data.
I recently solved the problem fairly simply after doing much research into
the source. When I am building the query and encounter a term I don't want
to recover payload data from, I add my own anonymous sub-type of
SpanTermQuery to my developing SpanNearQuery that itself creates an
anonymous sub-type of SpanTerm which simply returns an empty Collection for
it's payload data.
new SpanTermQuery(new Term(QueryVocabTracker.CONTENT_FIELD, tagToken)) {
@Override
public Spans getSpans(IndexReader reader)
throws IOException {
return new
TermSpans(reader.termPositions(term), term) {
@Override
public Collection getPayload()
throws IOException {
// no payload data for this
TermSpan
return Collections.emptyList();
}
};
}
}
thanks,
C>T>
On Wed, Nov 25, 2009 at 8:10 AM, Grant Ingersoll <[email protected]>wrote:
>
> On Nov 24, 2009, at 9:56 AM, Christopher Tignor wrote:
>
> > Hello,
> >
> > For certain span queries I construct problematically by piecing together
> my
> > own SpanTermQueries I would like to enforce that Payload data is not
> > returned for matches on those specific terms used by the constituent
> > SapnTermQueries.
>
> I'm not sure I follow. For those terms you don't want payloads, why can't
> you just avoid getting payloads? Span queries themselves do not require
> payloads for execution. Can you share your code for iterating over the
> spans?
>
> >
> > For exmaple if I search for a position match with a SpanQuery referencing
> > the tokens "_n" and "work" and there is Payload data for each (there
> needs
> > to be for other types of queries) I would like to be able to screen out
> the
> > payload data originating from any matched "_n" tokens.
> >
> > I thought for the tokens I am not interested in receiving payload data
> from
> > I might simply create (anonymously) my own subclass of SpanTermQuery
> which
> > overrides getSpans and returns another custom class which extends
> TermSpans
> > but there simply overrides isPayloadAvailable to return false:
> >
> > new SpanTermQuery(new Term(myField, myTokenString)) {
> >
> >
> >
> > public Spans getSpans(IndexReader reader)
> > throws IOException {
> > return new
> > TermSpans(reader.termPositions(term), term) {
> >
> > public boolean
> isPayloadAvailable()
> > {
> > return false;
> > }
> >
> > };
> > }
> > });
> >
> > This however seems to eliminating payload data for all matches though I'm
> > not sure why and am tracing through the code, looking at
> NearSpansUnordered.
> >
> > Any thoughts?
> >
> > thanks so much,
> >
> > C>T>
> >
> >
> > --
> > TH!NKMAP
> >
> > Christopher Tignor | Senior Software Architect
> > 155 Spring Street NY, NY 10012
> > p.212-285-8600 x385 f.212-285-8999
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>
--
TH!NKMAP
Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999