Re: [CODE4LIB] Precision and Recall

2011-06-04 Thread Max L. Wilson
Modern Information retrieval is certainly an authoritative text on IR. You 
might be good searching for ACM SIGIR references on TFxIDF or BM25 by Stephen 
Robertson and Karen sparck-jones. From the 70s. These were retrieval algorithms 
that significantly improved precision and recall of keyword search systems. 

These algorthms, or modern evolutions of, are embedded in things like 
solr/lucene now as standard. (apologies if you know all this).  

This would only apply really of you were doing fullyext indexing of databases. 
Retrieving by specific metadata in a field is a different ballgame. 

You can also look at average precision, which assesses the top N results or 
mean average precision that does takes a mean over M queries. In general it's 
assumed that if you increase recall (get more related results) you'll reduce 
precision (get more unrelated too) and so increasing precision (removing 
unrelated results) may reduce recall (removing some good ones by mistake)

Hope some of that us useful. And not patronising (I'm just sort of sitting on 
the fence of this email list). 

Max

Sent from my iPhone

On 3 Jun 2011, at 20:57, marijane white marijane.wh...@gmail.com wrote:

 I think that's quite possible.
 
 Here are a couple references I am familiar with.
 
 Walker/Janes/Tenopir's Online Retrieval is a bit dated but it does discuss
 the subject of precision and recall in bibliographic database searching.
 http://books.google.com/books?id=Srn3Jg7O4XoClpg=PP1pg=PP1#v=onepageqf=false
 
 Beyond bibliographic databases, Baeza-Yates/Riberio-Neto's discusses the
 subject in a broader context.
 http://www.amazon.com/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910
 
 
 -marijane
 
 On Fri, Jun 3, 2011 at 10:53 AM, Fleming, Declan dflem...@ucsd.edu wrote:
 
 Hi - I'm wondering if she is using a definition of database that seems to
 be common in libraries, that means a resource on the web that we pay for.
 
 D
 
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Alain Borel
 Sent: Friday, June 03, 2011 10:24 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Precision and Recall
 
 Dave Caroline dave.thearchiv...@gmail.com a écrit :
 The questions seem related to search engines or should you be googling
 for full text indexes or the other more correct name inverted index.
 Because in the normal scheme of events databases return exactly what
 you ask for.
 
 One could argue that the same thing happens with search engines. After all,
 both databases and search engines are deterministic programs that provide a
 set of records in response to a query.
 
 Precision and recall are not determined by what you ask - what defines them
 is how relevant the output records are with respect to a real-life question.
 It isn't tied to a technology. Of course, it can be more or less difficult
 to translate this question into a query, and the program might be more or
 less smart while processing the query.
 Both aspects affect precision and recall, in my opinion.
 
 Anybody who ever used a bibliographic database using Google-like queries
 can testify that a database can have extremely poor precision and recall in
 some use cases ;-)
 
 Best regards,
 Alain Borel
 EPFL Bibliothèque
 Rolex Learning Center
 1015 Lausanne (Switzerland)
 


Re: [CODE4LIB] Precision and Recall

2011-06-04 Thread Simon Spero
On Sat, Jun 4, 2011 at 4:56 AM, Max L. Wilson m.l.wil...@swansea.ac.uk
wrote:

 Modern Information retrieval is certainly an authoritative text on IR. You
might be good searching for ACM SIGIR references on TFxIDF or BM25 by
Stephen Robertson and Karen sparck-jones. From the 70s. These were retrieval
algorithms that significantly improved precision and recall of keyword
search systems.

 Just wanted to add that the SIGIR
museumhttp://www.sigir.org/museum/contents.htmlincludes [Spark
Jones, Karen, ed. (1981). Information Retrieval Experiment.
Butterworths. ]

There are several chapters by KSJ , including a review of tests from
1958-1978, as well as separate chapter on the Cranfield tests.

There are different issues involved in evaluating interactive information
retrieval systems.
 See [Kelly, Diane (2009). Methods for Evaluating Interactive Information
Retrieval Systems with
Usershttp://ils.unc.edu/%7Edianek/FnTIR-Press-Kelly.pdf.
Foundations and Trends in Information Retrieval Vol 3, Nos. 1-2, pp. 1-224].

Of course, the  only true measure of an IR system is whether it gets ever
searcher their documents, and saves their time..

Simon


Re: [CODE4LIB] Precision and Recall

2011-06-03 Thread Dave Caroline
The questions seem related to search engines or should you be googling for
full text indexes or the other more correct name inverted index.
Because in the normal scheme of events databases return exactly what
you ask for.

Dave Caroline

On Fri, Jun 3, 2011 at 4:18 PM, Fleming, Declan dflem...@ucsd.edu wrote:
 Hi folks!

 I got an interesting question from one of our librarians working on a paper, 
 and we want to include a bit about the qualities of a database, such as 
 precision and recall.  She is looking for references.

 I did the Google/Wikipedia lookups, but I'm sure she's done that too.

 http://en.wikipedia.org/wiki/Information_retrieval

 and here:

 http://en.wikipedia.org/wiki/Precision_and_recall


 If this subject resonates with anyone, give me a shout and maybe some links.

 Thanks!
 Declan



Re: [CODE4LIB] Precision and Recall

2011-06-03 Thread Alain Borel

Dave Caroline dave.thearchiv...@gmail.com a écrit :

The questions seem related to search engines or should you be googling for
full text indexes or the other more correct name inverted index.
Because in the normal scheme of events databases return exactly what
you ask for.


One could argue that the same thing happens with search engines. After  
all, both databases and search engines are deterministic programs that  
provide a set of records in response to a query.


Precision and recall are not determined by what you ask - what defines  
them is how relevant the output records are with respect to a  
real-life question. It isn't tied to a technology. Of course, it can  
be more or less difficult to translate this question into a query, and  
the program might be more or less smart while processing the query.  
Both aspects affect precision and recall, in my opinion.


Anybody who ever used a bibliographic database using Google-like  
queries can testify that a database can have extremely poor precision  
and recall in some use cases ;-)


Best regards,
Alain Borel
EPFL Bibliothèque
Rolex Learning Center
1015 Lausanne (Switzerland)


Re: [CODE4LIB] Precision and Recall

2011-06-03 Thread Fleming, Declan
Hi - I'm wondering if she is using a definition of database that seems to be 
common in libraries, that means a resource on the web that we pay for.

D

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Alain 
Borel
Sent: Friday, June 03, 2011 10:24 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Precision and Recall

Dave Caroline dave.thearchiv...@gmail.com a écrit :
 The questions seem related to search engines or should you be googling 
 for full text indexes or the other more correct name inverted index.
 Because in the normal scheme of events databases return exactly what 
 you ask for.

One could argue that the same thing happens with search engines. After all, 
both databases and search engines are deterministic programs that provide a set 
of records in response to a query.

Precision and recall are not determined by what you ask - what defines them is 
how relevant the output records are with respect to a real-life question. It 
isn't tied to a technology. Of course, it can be more or less difficult to 
translate this question into a query, and the program might be more or less 
smart while processing the query.  
Both aspects affect precision and recall, in my opinion.

Anybody who ever used a bibliographic database using Google-like queries can 
testify that a database can have extremely poor precision and recall in some 
use cases ;-)

Best regards,
Alain Borel
EPFL Bibliothèque
Rolex Learning Center
1015 Lausanne (Switzerland)


Re: [CODE4LIB] Precision and Recall

2011-06-03 Thread marijane white
I think that's quite possible.

Here are a couple references I am familiar with.

Walker/Janes/Tenopir's Online Retrieval is a bit dated but it does discuss
the subject of precision and recall in bibliographic database searching.
http://books.google.com/books?id=Srn3Jg7O4XoClpg=PP1pg=PP1#v=onepageqf=false

Beyond bibliographic databases, Baeza-Yates/Riberio-Neto's discusses the
subject in a broader context.
http://www.amazon.com/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910


-marijane

On Fri, Jun 3, 2011 at 10:53 AM, Fleming, Declan dflem...@ucsd.edu wrote:

 Hi - I'm wondering if she is using a definition of database that seems to
 be common in libraries, that means a resource on the web that we pay for.

 D

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Alain Borel
 Sent: Friday, June 03, 2011 10:24 AM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] Precision and Recall

 Dave Caroline dave.thearchiv...@gmail.com a écrit :
  The questions seem related to search engines or should you be googling
  for full text indexes or the other more correct name inverted index.
  Because in the normal scheme of events databases return exactly what
  you ask for.

 One could argue that the same thing happens with search engines. After all,
 both databases and search engines are deterministic programs that provide a
 set of records in response to a query.

 Precision and recall are not determined by what you ask - what defines them
 is how relevant the output records are with respect to a real-life question.
 It isn't tied to a technology. Of course, it can be more or less difficult
 to translate this question into a query, and the program might be more or
 less smart while processing the query.
 Both aspects affect precision and recall, in my opinion.

 Anybody who ever used a bibliographic database using Google-like queries
 can testify that a database can have extremely poor precision and recall in
 some use cases ;-)

 Best regards,
 Alain Borel
 EPFL Bibliothèque
 Rolex Learning Center
 1015 Lausanne (Switzerland)