Hi Gema,

Thanks for the inputs. I have stated the changes which I have incorporated
taking them into consideration inline

On Tue, Apr 30, 2013 at 4:26 PM, Gema Ramírez-Sánchez <[email protected]>wrote:

> Dear Mihir,
>
> thank you for your effort, see my comments inline:
>
>
> On Sun, Apr 28, 2013 at 6:17 PM, Mihir Rege <[email protected]> wrote:
>
>> Hi everybody,
>>
>> I have been working on the idea for creating  an interface for tagged
>> corpora since the past few days. I have finished the coding challenge, have
>> gone through the documentation pdf and  wikis and am currently working on
>> designing the interface. I am posting the mockups for the interface  so
>> that I can get your opinion and improve on them.
>>
>
> Great!
>
>
>> There  are currently three major interfaces:
>>
>> a) Manual disambiguator
>>
>> b) .prob evaluator
>>
>> c) .tsx file editor
>>
>> I have put up all the mockups together at
>> http://imgur.com/a/4uk4q#r5Ur8jT and have also put links in separate
>> sections.
>>
>>
>>
> Let's take a look, I like the idea of having access to all possible tasks
> that apply to work with taggers in an interface, it is difficult to
> organise because the process can be non-linear but still, a good idea.
>
>
>> *a) Manual disambiguator*
>>
>> Mockup: http://i.imgur.com/r5Ur8jT.png
>>
>> Functions:
>>
>>    - Jump to next ambiguous lexical unit or adjacent lexical-unit using
>>    the keyboard or mouse.
>>    - A quick-view bound to a key, to hide the tags and show the raw text
>>    - If the .tsx file is provided, information like the coarse tags,
>>    forbid, enforce rules applicable can also be displayed.
>>    - Show statistics of disambiguation
>>    - Compile and apply constraint grammar rules to the buffer
>>    - List the applied constraint grammar rules
>>    - Train and test the tagger (a prompt will ask the part of the corpus
>>    to be used as testing data).
>>    - Train the tagger and export the .prob file
>>    - Save progress ( this will save the corpus and also create a project
>>    description file which will keep track of the morphological analyser, .tsx
>>    files used, so that it is easier to resume tagging)
>>    - The interface will be keyboard centric, though it will be equally
>>    functional with a mouse.
>>    - Default keymaps will be provided and the bindings can be changed to
>>    suit the user
>>
>> For example
>>
>> [P] - <previous-ambiguous>
>>
>> [N] - <next-ambiguous>
>>
>> [F] - <forward-word>
>>
>> [B] - <back-word>
>>
>> [1], [2],[3],[4] for choosing the correct lexical form.
>>
>
>
> Cool functions, all very useful for the end-user of this tool. I think
> that one more function (if possible) would be really useful: select one or
> more words and see the output of the morphological analyser for them and be
> able to change the tagged corpus according to the morphological output.
> This will help a lot to re-tag the corpus to make it consistent with
> changes in dictionaries.
>
>
I have added two new features with respect to making the corpus consistent
with changes in the dictionaries.
1> A selection of one or more words can be re-analysed by the morphological
analyser and can be re-tagged as usual for an ambigious lexical form
2> On reopening a previously analyzed corpus, the tool checks whether each
of the tags is present in the current analysis of the word and it will also
pop an alert if the corpus has got unaligned due to the addition of a
multiword in the dictionary


> *Evaluating the tagger*
>>
>> Functions
>>
>>
>>    - The trained tagger can be evaluated immediately by having an option
>>    of setting aside x% of the corpus as testing data.
>>    - Else, it can be evaluated using the .prob evaluator using an
>>    unrelated corpus.
>>
>> That's fine. Many times what we want to do is to compare two .probs to
> see if the "improved" one is doing better than the "old" one. We should
> maybe provide some way to make it easy: test with two .probs and show
> differences to the user.
>
>
I have addressed by allowing multiple .prob files to be evaluated on the
same corpus, which will provide comparisons after evaluating them.

>  *Loading the corpus*
>>
>> The available options are:
>>
>>    1.
>>
>>    Load a raw-text file, morphological analyser and .tsx file (optional)
>>    2.
>>
>>    Continue on an existing project
>>    3.
>>
>>    Pull a wiki-dump and use it as the corpus [
>>    http://i.imgur.com/F9OXMs4.png ]
>>
>>
>>
>> I think we should add something about the language. The user should be
> able to define the language of the raw-text file or of the wikipedia dump,
> then have a list of available pairs or languages (if any) to create the
> first tagged but non-disambiguated version of the corpus.
>



A dialog for choosing the morphological analyser will show up after
selecting one of the  options which will list the available language pairs.

>
>>    1.
>>
>>
>> *b) .prob evaluator*
>>
>> Mockup: http://i.imgur.com/fIo6rV9.png
>>
>> Functions
>>
>>    - Input the .prob file , the manually disambiguated corpus along with
>>    morphologically analysed corpus or the morphological analyser for the
>>    language.
>>    - Evaluate the .prob file and display statistics about tagger accuracy
>>    - Generate a log file, which will basically be the diff between the
>>    provided  tagged corpus and the corpus disambiguated by the tagger, making
>>    it easier to frame new  sentences to add to the corpus, so as to give more
>>    context to the tagger
>>
>>
> Great, see my comment about comparing two .probs above.
>
>
>>
>>
>> *c) .tsx file editor*
>>
>> Mockup:
>>
>> TSX Viewer
>>
>> http://i.imgur.com/pVdsIem.png
>>
>> Templates
>>
>> http://i.imgur.com/hFGIQHR.png
>>
>> Functions:
>>
>>    -
>>
>>    Add new tags
>>    - categories
>>       - multi-categories
>>       - forbid
>>       - enforce
>>       - prefer
>>    - Templates for adding new tags
>>    - Change the order of the tags (as more specific categories must be
>>    defined before more general ones) within the same parent tag. The nodes in
>>    the xml viewer can also be made draggable within the same parent node  to
>>    make it easier to change the order
>>    - Search within tags for faster navigation.
>>    - Validate the tagger definition
>>    - Editor features  like syntax highlighting , auto-indentation and
>>    tag completion for manual editing in the Node Contents textview for 
>> complex
>>     in-place editing.
>>
>>
>> Pretty good. A more human-oriented representation of the information
> (symbols meaning) could be also a good thing to have in this interface. And
> if we can produce examples that represent what a coarse tag groups or what
> a forbid, enforce or prefer rule is doing, that would be brilliant as well.
>
>
I have addressed this by adding a simple help button to the template
interface, which will give information about the currently selected tag.


>
>
>> Looking forward to hear from you. :)
>>
>>
>>
> Sorry again with the delay and don't forget about adding your proposal.
>


Thanks for the feedback. I have updated the changes in my proposal at h
ttp://wiki.apertium.org/wiki/User:Mihirrege/GSOC_2013_Application_-_Interface_for_creating_tagged_corpora<http://wiki.apertium.org/wiki/User:Mihirrege/GSOC_2013_Application_-_Interface_for_creating_tagged_corpora>
.
If possible can you go through it and review it for me?

Regards,
Mihir Rege

>  Regards,
>>
>
> Best,
>
> Gema Ramírez.
>
>
>>  Mihir Rege,
>>
>> Second Year Undergraduate,
>>
>> Department of Computer Science and Engineering,
>>
>> IIT Kharagpur.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Try New Relic Now & We'll Send You this Cool Shirt
>> New Relic is the only SaaS-based application performance monitoring
>> service
>> that delivers powerful full stack analytics. Optimize and monitor your
>> browser, app, & servers with just a few lines of code. Try New Relic
>> and get this awesome Nerd Life shirt!
>> http://p.sf.net/sfu/newrelic_d2d_apr
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>
>
> --
> Gema Ramírez
> ---------------------
> Prompsit LE
> Traduce, extrae, analiza: http://aplica.prompsit.com
>
>
> ------------------------------------------------------------------------------
> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
> Get 100% visibility into your production application - at no cost.
> Code-level diagnostics for performance bottlenecks with <2% overhead
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap1
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>


-- 
Mihir Rege
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to