Re: [CODE4LIB] solr - search query count | highlighting

Eric James Fri, 16 Oct 2009 12:55:25 -0700

Thanks for your response.  But, yes I'm able to use facets in general, and yes 
I'm able to do highlighting on stored fields.


 

But finding how many times the query appears in the full text is my question. 
For example say you search on "Heisenberg"   We'd like to see:

 

Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid

Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid

Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid

etc

 

Could there be a solr parameter that calculates this? Otherwise a klugey, not 
very scalable method could be that once you retrieve a solr result xml, find 
the fedora pid, retrieve the EAD full text, run a standard function to count 
how many times the query appears in the text for each hit, and add parameters 
back into the xml with these counts. 

 

 
> Date: Fri, 16 Oct 2009 15:27:42 -0400
> From: ewg4x...@gmail.com
> Subject: Re: [CODE4LIB] solr - search query count | highlighting
> To: CODE4LIB@LISTSERV.ND.EDU
> 
> Hi Eric,
> 
> You do not have to store the entire text content of the EAD guide in order
> to enable facets. Here's an example:
> http://kittredgecollection.org/results?q=*:* . There are about 15 facets
> enabled on a collection of almost 1500 EAD documents (though quite small in
> filesize compared to traditional EAD finding aids), and there's no slowdown
> whatsoever. I don't believe you need to store the guides to enable
> highlighting either, though I have heard there is some dropoff in
> performance with highlighting enabled. I've never done benchmarking on
> highlighting enabled versus disabled, so I can't tell you how much of a
> dropoff there is. In an index of only several hundred documents, I would
> think that the dropoff with highlighting enabled would be fairly negligible.
> 
> Ethan
> 
> On Fri, Oct 16, 2009 at 3:12 PM, Eric James <cirese...@hotmail.com> wrote:
> 
> > For our finding aids, we are using fedoragenericsearch 2.2 with solr as
> > index. Because the EADs can be huge, the EADs are indexed but not stored
> > (with stored EADs, search time for ~500 objects = 20 min rather than < 1
> > sec).
> >
> >
> >
> > However, we would like to have number of search terms found within each
> > hit. For example, CDL's collection:
> >
> > http://www.oac.cdlib.org/search?query=Donner
> >
> >
> >
> > Also we would like highlighting/snippets of the search term similar to
> > CDL's.
> >
> >
> >
> > Is it a lost cause to have this functionality without storing the EAD? Is
> > there a way to store the EAD and have a reasonable response time?
> >
> >
> >
> > ---
> >
> > Eric James
> >
> > Yale University Libraries
> >
> >
> >
> >
> >

Re: [CODE4LIB] solr - search query count | highlighting

Reply via email to