Hi everybody,

I have been working on the idea for creating  an interface for tagged
corpora since the past few days. I have finished the coding challenge, have
gone through the documentation pdf and  wikis and am currently working on
designing the interface. I am posting the mockups for the interface  so
that I can get your opinion and improve on them.

There  are currently three major interfaces:

a) Manual disambiguator

b) .prob evaluator

c) .tsx file editor

I have put up all the mockups together at http://imgur.com/a/4uk4q#r5Ur8jT and
have also put links in separate sections.


*a) Manual disambiguator*

Mockup: http://i.imgur.com/r5Ur8jT.png

Functions:

   - Jump to next ambiguous lexical unit or adjacent lexical-unit using the
   keyboard or mouse.
   - A quick-view bound to a key, to hide the tags and show the raw text
   - If the .tsx file is provided, information like the coarse tags,
   forbid, enforce rules applicable can also be displayed.
   - Show statistics of disambiguation
   - Compile and apply constraint grammar rules to the buffer
   - List the applied constraint grammar rules
   - Train and test the tagger (a prompt will ask the part of the corpus to
   be used as testing data).
   - Train the tagger and export the .prob file
   - Save progress ( this will save the corpus and also create a project
   description file which will keep track of the morphological analyser, .tsx
   files used, so that it is easier to resume tagging)
   - The interface will be keyboard centric, though it will be equally
   functional with a mouse.
   - Default keymaps will be provided and the bindings can be changed to
   suit the user

For example

[P] - <previous-ambiguous>

[N] - <next-ambiguous>

[F] - <forward-word>

[B] - <back-word>

[1], [2],[3],[4] for choosing the correct lexical form.

*Evaluating the tagger*

Functions


   - The trained tagger can be evaluated immediately by having an option of
   setting aside x% of the corpus as testing data.
   - Else, it can be evaluated using the .prob evaluator using an unrelated
   corpus.

*Loading the corpus*

The available options are:

   1.

   Load a raw-text file, morphological analyser and .tsx file (optional)
   2.

   Continue on an existing project
   3.

   Pull a wiki-dump and use it as the corpus [
   http://i.imgur.com/F9OXMs4.png ]



*b) .prob evaluator*

Mockup: http://i.imgur.com/fIo6rV9.png

Functions

   - Input the .prob file , the manually disambiguated corpus along with
   morphologically analysed corpus or the morphological analyser for the
   language.
   - Evaluate the .prob file and display statistics about tagger accuracy
   - Generate a log file, which will basically be the diff between the
   provided  tagged corpus and the corpus disambiguated by the tagger, making
   it easier to frame new  sentences to add to the corpus, so as to give more
   context to the tagger

*c) .tsx file editor*

Mockup:

TSX Viewer

http://i.imgur.com/pVdsIem.png

Templates

http://i.imgur.com/hFGIQHR.png

Functions:

   -

   Add new tags
   - categories
      - multi-categories
      - forbid
      - enforce
      - prefer
   - Templates for adding new tags
   - Change the order of the tags (as more specific categories must be
   defined before more general ones) within the same parent tag. The nodes in
   the xml viewer can also be made draggable within the same parent node  to
   make it easier to change the order
   - Search within tags for faster navigation.
   - Validate the tagger definition
   - Editor features  like syntax highlighting , auto-indentation and tag
   completion for manual editing in the Node Contents textview for complex
    in-place editing.


Looking forward to hear from you. :)


Regards,

Mihir Rege,

Second Year Undergraduate,

Department of Computer Science and Engineering,

IIT Kharagpur.
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to