Re: [CODE4LIB] Precision and Recall
Modern Information retrieval is certainly an authoritative text on IR. You might be good searching for ACM SIGIR references on TFxIDF or BM25 by Stephen Robertson and Karen sparck-jones. From the 70s. These were retrieval algorithms that significantly improved precision and recall of keyword search systems. These algorthms, or modern evolutions of, are embedded in things like solr/lucene now as standard. (apologies if you know all this). This would only apply really of you were doing fullyext indexing of databases. Retrieving by specific metadata in a field is a different ballgame. You can also look at average precision, which assesses the top N results or mean average precision that does takes a mean over M queries. In general it's assumed that if you increase recall (get more related results) you'll reduce precision (get more unrelated too) and so increasing precision (removing unrelated results) may reduce recall (removing some good ones by mistake) Hope some of that us useful. And not patronising (I'm just sort of sitting on the fence of this email list). Max Sent from my iPhone On 3 Jun 2011, at 20:57, marijane white marijane.wh...@gmail.com wrote: I think that's quite possible. Here are a couple references I am familiar with. Walker/Janes/Tenopir's Online Retrieval is a bit dated but it does discuss the subject of precision and recall in bibliographic database searching. http://books.google.com/books?id=Srn3Jg7O4XoClpg=PP1pg=PP1#v=onepageqf=false Beyond bibliographic databases, Baeza-Yates/Riberio-Neto's discusses the subject in a broader context. http://www.amazon.com/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910 -marijane On Fri, Jun 3, 2011 at 10:53 AM, Fleming, Declan dflem...@ucsd.edu wrote: Hi - I'm wondering if she is using a definition of database that seems to be common in libraries, that means a resource on the web that we pay for. D -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Alain Borel Sent: Friday, June 03, 2011 10:24 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Precision and Recall Dave Caroline dave.thearchiv...@gmail.com a écrit : The questions seem related to search engines or should you be googling for full text indexes or the other more correct name inverted index. Because in the normal scheme of events databases return exactly what you ask for. One could argue that the same thing happens with search engines. After all, both databases and search engines are deterministic programs that provide a set of records in response to a query. Precision and recall are not determined by what you ask - what defines them is how relevant the output records are with respect to a real-life question. It isn't tied to a technology. Of course, it can be more or less difficult to translate this question into a query, and the program might be more or less smart while processing the query. Both aspects affect precision and recall, in my opinion. Anybody who ever used a bibliographic database using Google-like queries can testify that a database can have extremely poor precision and recall in some use cases ;-) Best regards, Alain Borel EPFL Bibliothèque Rolex Learning Center 1015 Lausanne (Switzerland)
Re: [CODE4LIB] Precision and Recall
On Sat, Jun 4, 2011 at 4:56 AM, Max L. Wilson m.l.wil...@swansea.ac.uk wrote: Modern Information retrieval is certainly an authoritative text on IR. You might be good searching for ACM SIGIR references on TFxIDF or BM25 by Stephen Robertson and Karen sparck-jones. From the 70s. These were retrieval algorithms that significantly improved precision and recall of keyword search systems. Just wanted to add that the SIGIR museumhttp://www.sigir.org/museum/contents.htmlincludes [Spark Jones, Karen, ed. (1981). Information Retrieval Experiment. Butterworths. ] There are several chapters by KSJ , including a review of tests from 1958-1978, as well as separate chapter on the Cranfield tests. There are different issues involved in evaluating interactive information retrieval systems. See [Kelly, Diane (2009). Methods for Evaluating Interactive Information Retrieval Systems with Usershttp://ils.unc.edu/%7Edianek/FnTIR-Press-Kelly.pdf. Foundations and Trends in Information Retrieval Vol 3, Nos. 1-2, pp. 1-224]. Of course, the only true measure of an IR system is whether it gets ever searcher their documents, and saves their time.. Simon
Re: [CODE4LIB] Precision and Recall
The questions seem related to search engines or should you be googling for full text indexes or the other more correct name inverted index. Because in the normal scheme of events databases return exactly what you ask for. Dave Caroline On Fri, Jun 3, 2011 at 4:18 PM, Fleming, Declan dflem...@ucsd.edu wrote: Hi folks! I got an interesting question from one of our librarians working on a paper, and we want to include a bit about the qualities of a database, such as precision and recall. She is looking for references. I did the Google/Wikipedia lookups, but I'm sure she's done that too. http://en.wikipedia.org/wiki/Information_retrieval and here: http://en.wikipedia.org/wiki/Precision_and_recall If this subject resonates with anyone, give me a shout and maybe some links. Thanks! Declan
Re: [CODE4LIB] Precision and Recall
Dave Caroline dave.thearchiv...@gmail.com a écrit : The questions seem related to search engines or should you be googling for full text indexes or the other more correct name inverted index. Because in the normal scheme of events databases return exactly what you ask for. One could argue that the same thing happens with search engines. After all, both databases and search engines are deterministic programs that provide a set of records in response to a query. Precision and recall are not determined by what you ask - what defines them is how relevant the output records are with respect to a real-life question. It isn't tied to a technology. Of course, it can be more or less difficult to translate this question into a query, and the program might be more or less smart while processing the query. Both aspects affect precision and recall, in my opinion. Anybody who ever used a bibliographic database using Google-like queries can testify that a database can have extremely poor precision and recall in some use cases ;-) Best regards, Alain Borel EPFL Bibliothèque Rolex Learning Center 1015 Lausanne (Switzerland)
Re: [CODE4LIB] Precision and Recall
Hi - I'm wondering if she is using a definition of database that seems to be common in libraries, that means a resource on the web that we pay for. D -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Alain Borel Sent: Friday, June 03, 2011 10:24 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Precision and Recall Dave Caroline dave.thearchiv...@gmail.com a écrit : The questions seem related to search engines or should you be googling for full text indexes or the other more correct name inverted index. Because in the normal scheme of events databases return exactly what you ask for. One could argue that the same thing happens with search engines. After all, both databases and search engines are deterministic programs that provide a set of records in response to a query. Precision and recall are not determined by what you ask - what defines them is how relevant the output records are with respect to a real-life question. It isn't tied to a technology. Of course, it can be more or less difficult to translate this question into a query, and the program might be more or less smart while processing the query. Both aspects affect precision and recall, in my opinion. Anybody who ever used a bibliographic database using Google-like queries can testify that a database can have extremely poor precision and recall in some use cases ;-) Best regards, Alain Borel EPFL Bibliothèque Rolex Learning Center 1015 Lausanne (Switzerland)
Re: [CODE4LIB] Precision and Recall
I think that's quite possible. Here are a couple references I am familiar with. Walker/Janes/Tenopir's Online Retrieval is a bit dated but it does discuss the subject of precision and recall in bibliographic database searching. http://books.google.com/books?id=Srn3Jg7O4XoClpg=PP1pg=PP1#v=onepageqf=false Beyond bibliographic databases, Baeza-Yates/Riberio-Neto's discusses the subject in a broader context. http://www.amazon.com/Modern-Information-Retrieval-Concepts-Technology/dp/0321416910 -marijane On Fri, Jun 3, 2011 at 10:53 AM, Fleming, Declan dflem...@ucsd.edu wrote: Hi - I'm wondering if she is using a definition of database that seems to be common in libraries, that means a resource on the web that we pay for. D -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Alain Borel Sent: Friday, June 03, 2011 10:24 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Precision and Recall Dave Caroline dave.thearchiv...@gmail.com a écrit : The questions seem related to search engines or should you be googling for full text indexes or the other more correct name inverted index. Because in the normal scheme of events databases return exactly what you ask for. One could argue that the same thing happens with search engines. After all, both databases and search engines are deterministic programs that provide a set of records in response to a query. Precision and recall are not determined by what you ask - what defines them is how relevant the output records are with respect to a real-life question. It isn't tied to a technology. Of course, it can be more or less difficult to translate this question into a query, and the program might be more or less smart while processing the query. Both aspects affect precision and recall, in my opinion. Anybody who ever used a bibliographic database using Google-like queries can testify that a database can have extremely poor precision and recall in some use cases ;-) Best regards, Alain Borel EPFL Bibliothèque Rolex Learning Center 1015 Lausanne (Switzerland)