Re: Benchmarking on GOV2

Marvin Humphrey Mon, 29 May 2006 11:12:15 -0700


On May 29, 2006, at 10:58 AM, Andrzej Bialecki wrote:

Has anyone used existing categorization data associated with theReuters corpus to build a benchmarker that measured IR precisionand/or recall?
That would be RCV1 or RCV2, right? AFAIK the Reuters-21578 has nosuch information ... The use of RCV1/RCV2 is subject to a morestringent license than Reuters-21578, so that few people would beable to actually run the benchmarks.

21578 has categorization information. Here's a snippet from one ofthe SGML files (note the TOPICS tag):

<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET"OLDID="5562" NEWID="19">

<DATE>26-FEB-1987 15:26:54.12</DATE>
<TOPICS><D>wheat</D><D>grain</D></TOPICS>
<PLACES><D>yemen-arab-republic</D><D>usa</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN>
&#5;&#5;&#5;C G
&#22;&#22;&#1;f0798&#31;reute
u f BC-/BONUS-WHEAT-FLOUR-FO   02-26 0096</UNKNOWN>
<TEXT>&#2;
<TITLE>BONUS WHEAT FLOUR FOR NORTH YEMEN  -- USDA</TITLE>

I'm not sure how to use this info, though -- I'm just investigatingwhether there's prior art before I start thinking hard about it.


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Benchmarking on GOV2

Reply via email to