I've been thinking about how to add spans to Solr, but haven't
actually codified it yet. I see no reason why a query parser can't
support some syntax and the "dump spans" method approach can't be co-
opted to write out the spans to the response. Seems like it would
need to be an additional part of the QueryComponent, plus some
addition to the query parsers. We can more easily add it to the
Dismax parser, but if we add it to the Lucene one, then we should make
that change in Lucene.
-Grant
On Apr 29, 2009, at 7:06 PM, Sean O'Connor wrote:
Hello,
I'm trying to find a decent approach for getting token positions
out of (or is that into?) solr query results. Is the best approach
to extend a QueryComponent and/or HighlightComponent? I'm new to
solr, and still on fairly shaky ground soany pointers or suggestions
are quite welcome.
As a little BACKGROUND:
I am trying to migrate a custom lucene-only content anaylsys
project to solr. The 'old' system programmatically runs a few
thousand predefined queries against a corpus, and then analyzes the
results. The lucene score is good, but the actual position of the
hits is also quite important.
My previous system did a simple query parsing to create
SpanQuerys, and then used a modified dumpSpans() to get the token
position from the spans. Now I am trying to find how to use solr's
goodness (and MemoryIndex approach?) to get the span positions in a
more logical manner. I think the answer is in the highlighter, but
I'm getting a little twisted around, and could use a pointer.
I am using a recent Solr nightly snapshot, grails, Aduna Aperture,
and Intellij (if any of that matters)
Thanks,
Sean
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search