Re: Open Relevance Requirements Questions

Itamar Syn-Hershko Tue, 21 Sep 2010 11:55:28 -0700

Addressing most of the recent discussion below...

On 16/9/2010 4:24 AM, Dan Cardin wrote:

    1. Should the Open Relevance viewer be capable of importing text and
    images?

Corpora, IMO, should be text only and index-ready (e.g. no specialparsing required). This is what I assumed in Orev, as well (see below).

Is the objective of the Open Relevance Viewer to provide a crowd sourcing
tool that can have its data annotated and then to use the annotated data for
determining the performance of machine learning techniques/algorithms? Or,
is it to provide a generic crowd souring tool for academics, government, and
industry to annotate data with? Or am I missing the point?

This tool should be, as Grant and Mark mentioned, engine agnostic. Itshould provide those interested with tools to be able to judgeeffectiveness of different engines, and also different methods with thesame engine.

Hence, the most basic implementation should know to handle many corporaand topics for more than one (natural) language, and the crowd-sourcingportion of it is where a user can create judgments - e.g. view adocument from a corpus side by side with a topic, and mark "Relevant","Non-relevant" (or "Skip this").

This banal implementation after several hundreds of human-hours willresult in packages containing corpora, topics and judgments for severallanguages. This can then be used as basis for more sophisticated partsof the project, where relevance ranking of actual query results,TREC-like testing, MAP/MRR and user behavior tracking are just examples.In other words, IMHO Grant's view is a bit too far going for this stage,where there's still a lot of fundamental work to do.

Robert, from the discussion we had a while ago I gathered you arethinking the same?

Once such data exists in a central system, importing corpora and topics,and exporting them back with judgments in various formats (TREC, CLEF,FIRE) can be done fairly easily. We just need to make sure that systemstores all data correctly.

Sorry for bringing this up again, but I think I pretty much did most ofthat work already, so no need for redundant efforts. In Orev I havealready spec'd and implemented all the above. What is missing is somebetter GUI and user management. I suggest you have a look at it and atits DB scheme: http://github.com/synhershko/Orev/blob/master/Orev.png

How are annotations used for judgments obtained? Separate file, specifed by the 
user?

If a tool like Orev will be used, then this data can be pulled directlyfrom its DB by the actual test tools (if separate).

Can you provide me with a direct link to the TREC format?


http://trec.nist.gov/pubs/trec1/papers/01.txt

But if we are not going to base data storage on the FS, there's no needto stick to a particular format, only when exporting judgments...


Itamar

Re: Open Relevance Requirements Questions

Reply via email to