Hi Eric,
If you use &debugQuery=on parameter, you'll receive the "explain" structure,
which tell
you about the score number calculation factors. An example:
<str name="oai:URMST:Transformation_Service/10000">
1.5076942 = (MATCH) fieldWeight(text:chant in 0), product of:
1.4142135 = tf(termFreq(text:chant)=2)
6.8230457 = idf(docFreq=1, numDocs=676)
0.15625 = fieldNorm(field=text, doc=0)
</str>
Here tf(termFreq(text:chant)=2) tell you, that the queried term found two
times
in the document. You should apply a regex to extract this info from the
explain
string. Since this term is an analyzed term, it is possible that it not
equals with the
user input, but debug's 'parsedquery' parameter tell you the terms Solr
search
behind the scene.
In Lucene, if the field stores the termVector's positions, there are API
calls, that
you can get the exact place of the term within the field (as character
positions,
or as the n-th token), but I don't know how to extract this info through
Solr.
Hope this helps.
Király Péter
eXtensible Catalog
http://xcproject.org
----- Original Message -----
From: "Eric James" <cirese...@hotmail.com>
To: <CODE4LIB@LISTSERV.ND.EDU>
Sent: Friday, October 16, 2009 9:52 PM
Subject: Re: [CODE4LIB] solr - search query count | highlighting
Thanks for your response. But, yes I'm able to use facets in general, and
yes I'm able to do highlighting on stored fields.
But finding how many times the query appears in the full text is my
question. For example say you search on "Heisenberg" We'd like to see:
Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid
Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid
Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid
etc
Could there be a solr parameter that calculates this? Otherwise a klugey,
not very scalable method could be that once you retrieve a solr result xml,
find the fedora pid, retrieve the EAD full text, run a standard function to
count how many times the query appears in the text for each hit, and add
parameters back into the xml with these counts.
Date: Fri, 16 Oct 2009 15:27:42 -0400
From: ewg4x...@gmail.com
Subject: Re: [CODE4LIB] solr - search query count | highlighting
To: CODE4LIB@LISTSERV.ND.EDU
Hi Eric,
You do not have to store the entire text content of the EAD guide in order
to enable facets. Here's an example:
http://kittredgecollection.org/results?q=*:* . There are about 15 facets
enabled on a collection of almost 1500 EAD documents (though quite small
in
filesize compared to traditional EAD finding aids), and there's no
slowdown
whatsoever. I don't believe you need to store the guides to enable
highlighting either, though I have heard there is some dropoff in
performance with highlighting enabled. I've never done benchmarking on
highlighting enabled versus disabled, so I can't tell you how much of a
dropoff there is. In an index of only several hundred documents, I would
think that the dropoff with highlighting enabled would be fairly
negligible.
Ethan
On Fri, Oct 16, 2009 at 3:12 PM, Eric James <cirese...@hotmail.com> wrote:
> For our finding aids, we are using fedoragenericsearch 2.2 with solr as
> index. Because the EADs can be huge, the EADs are indexed but not stored
> (with stored EADs, search time for ~500 objects = 20 min rather than < 1
> sec).
>
>
>
> However, we would like to have number of search terms found within each
> hit. For example, CDL's collection:
>
> http://www.oac.cdlib.org/search?query=Donner
>
>
>
> Also we would like highlighting/snippets of the search term similar to
> CDL's.
>
>
>
> Is it a lost cause to have this functionality without storing the EAD?
> Is
> there a way to store the EAD and have a reasonable response time?
>
>
>
> ---
>
> Eric James
>
> Yale University Libraries
>
>
>
>
>