Hi all,

Following a discussion with Robert, I have started working on a viewer application intended to make viewing and judgment of corpora and topics as easy as possible. The intention is to make this development as rapid as it can possibly be. I'm building this with .NET (NHibernate / ASP.NET MVC).


Following are several remarks / high-level description. I'm interested in capturing some early feedback and ideas, but please note my intention is to start with something functional first.


While FILEFORMATS.txt defines file structures, since the viewer is working against a DB those will only be honored via export functions. See attached image for a domain model.


A corpus DB entry points to a FS path (could also be remote via HTTP for example). The viewer, in turn, will load the files one by one and the judgment will be saved with the Corpus ID, Topic ID and a string representation of the document filename. The former 2 are integers, and document ID is defined as a string, so document file-names can use a base-24 ID representation for generated corpora (i.e. exporting from a wiki-dump).


Unlike what was stated in FILEFORMATS.TXT, a corpus will not reside in a gzipped file.

The above approach may allow for more than one people judging the same document for the same topic at once - which is bad since it could waste the users time (no need for double-judgment). I'll probably have to resolve this by implementing a HiLo-like mechanism (or pooling), but I'm leaving this for later.

The web application will allow for submitting new topics per language, and to judge documents for a topic. The Judgment screen will show the topic at top, navigation at left, and the document in rest of the screen. The user can choose "Relevant", "Irrelevant", "Skip".

A user can filter by language, so he sees only topics relevant to him. Language filtering can be applied using a language string ("en-US") per topic and corpus.

Thats about it for now, looking forward to some feedback.


Itamar.

Reply via email to