Addressing most of the recent discussion below...

On 16/9/2010 4:24 AM, Dan Cardin wrote:
    1. Should the Open Relevance viewer be capable of importing text and
    images?
Corpora, IMO, should be text only and index-ready (e.g. no special parsing required). This is what I assumed in Orev, as well (see below).

Is the objective of the Open Relevance Viewer to provide a crowd sourcing
tool that can have its data annotated and then to use the annotated data for
determining the performance of machine learning techniques/algorithms? Or,
is it to provide a generic crowd souring tool for academics, government, and
industry to annotate data with? Or am I missing the point?
This tool should be, as Grant and Mark mentioned, engine agnostic. It should provide those interested with tools to be able to judge effectiveness of different engines, and also different methods with the same engine.

Hence, the most basic implementation should know to handle many corpora and topics for more than one (natural) language, and the crowd-sourcing portion of it is where a user can create judgments - e.g. view a document from a corpus side by side with a topic, and mark "Relevant", "Non-relevant" (or "Skip this").

This banal implementation after several hundreds of human-hours will result in packages containing corpora, topics and judgments for several languages. This can then be used as basis for more sophisticated parts of the project, where relevance ranking of actual query results, TREC-like testing, MAP/MRR and user behavior tracking are just examples. In other words, IMHO Grant's view is a bit too far going for this stage, where there's still a lot of fundamental work to do.

Robert, from the discussion we had a while ago I gathered you are thinking the same?

Once such data exists in a central system, importing corpora and topics, and exporting them back with judgments in various formats (TREC, CLEF, FIRE) can be done fairly easily. We just need to make sure that system stores all data correctly.

Sorry for bringing this up again, but I think I pretty much did most of that work already, so no need for redundant efforts. In Orev I have already spec'd and implemented all the above. What is missing is some better GUI and user management. I suggest you have a look at it and at its DB scheme: http://github.com/synhershko/Orev/blob/master/Orev.png

How are annotations used for judgments obtained? Separate file, specifed by the 
user?
If a tool like Orev will be used, then this data can be pulled directly from its DB by the actual test tools (if separate).

Can you provide me with a direct link to the TREC format?

http://trec.nist.gov/pubs/trec1/papers/01.txt

But if we are not going to base data storage on the FS, there's no need to stick to a particular format, only when exporting judgments...

Itamar

Reply via email to