I've been thinking about how to add spans to Solr, but haven't actually codified it yet. I see no reason why a query parser can't support some syntax and the "dump spans" method approach can't be co- opted to write out the spans to the response. Seems like it would need to be an additional part of the QueryComponent, plus some addition to the query parsers. We can more easily add it to the Dismax parser, but if we add it to the Lucene one, then we should make that change in Lucene.

-Grant

On Apr 29, 2009, at 7:06 PM, Sean O'Connor wrote:

Hello,
I'm trying to find a decent approach for getting token positions out of (or is that into?) solr query results. Is the best approach to extend a QueryComponent and/or HighlightComponent? I'm new to solr, and still on fairly shaky ground soany pointers or suggestions are quite welcome.

  As a little BACKGROUND:
I am trying to migrate a custom lucene-only content anaylsys project to solr. The 'old' system programmatically runs a few thousand predefined queries against a corpus, and then analyzes the results. The lucene score is good, but the actual position of the hits is also quite important.

My previous system did a simple query parsing to create SpanQuerys, and then used a modified dumpSpans() to get the token position from the spans. Now I am trying to find how to use solr's goodness (and MemoryIndex approach?) to get the span positions in a more logical manner. I think the answer is in the highlighter, but I'm getting a little twisted around, and could use a pointer.

I am using a recent Solr nightly snapshot, grails, Aduna Aperture, and Intellij (if any of that matters)
Thanks,

Sean


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to