Hi Mihir,

thank you for the changes. The proposal in the wiki looks good. Make sure
we will be able to review it by posting on the Google Summer of Code
site<http://www.google-melange.com/gsoc/homepage/google/gsoc2013>!
— if you don't submit them to Google, we can't make the review!

Best,

Gema.


On Fri, May 3, 2013 at 1:12 AM, Mihir Rege <[email protected]> wrote:

> Hi Gema,
>
> Thanks for the inputs. I have stated the changes which I have incorporated
> taking them into consideration inline
>
> On Tue, Apr 30, 2013 at 4:26 PM, Gema Ramírez-Sánchez 
> <[email protected]>wrote:
>
>> Dear Mihir,
>>
>> thank you for your effort, see my comments inline:
>>
>>
>> On Sun, Apr 28, 2013 at 6:17 PM, Mihir Rege <[email protected]> wrote:
>>
>>> Hi everybody,
>>>
>>> I have been working on the idea for creating  an interface for tagged
>>> corpora since the past few days. I have finished the coding challenge, have
>>> gone through the documentation pdf and  wikis and am currently working on
>>> designing the interface. I am posting the mockups for the interface  so
>>> that I can get your opinion and improve on them.
>>>
>>
>> Great!
>>
>>
>>> There  are currently three major interfaces:
>>>
>>> a) Manual disambiguator
>>>
>>> b) .prob evaluator
>>>
>>> c) .tsx file editor
>>>
>>> I have put up all the mockups together at
>>> http://imgur.com/a/4uk4q#r5Ur8jT and have also put links in separate
>>> sections.
>>>
>>>
>>>
>> Let's take a look, I like the idea of having access to all possible tasks
>> that apply to work with taggers in an interface, it is difficult to
>> organise because the process can be non-linear but still, a good idea.
>>
>>
>>> *a) Manual disambiguator*
>>>
>>> Mockup: http://i.imgur.com/r5Ur8jT.png
>>>
>>> Functions:
>>>
>>>    - Jump to next ambiguous lexical unit or adjacent lexical-unit using
>>>    the keyboard or mouse.
>>>    - A quick-view bound to a key, to hide the tags and show the raw text
>>>    - If the .tsx file is provided, information like the coarse tags,
>>>    forbid, enforce rules applicable can also be displayed.
>>>    - Show statistics of disambiguation
>>>    - Compile and apply constraint grammar rules to the buffer
>>>    - List the applied constraint grammar rules
>>>    - Train and test the tagger (a prompt will ask the part of the
>>>    corpus to be used as testing data).
>>>    - Train the tagger and export the .prob file
>>>    - Save progress ( this will save the corpus and also create a
>>>    project description file which will keep track of the morphological
>>>    analyser, .tsx files used, so that it is easier to resume tagging)
>>>    - The interface will be keyboard centric, though it will be equally
>>>    functional with a mouse.
>>>    - Default keymaps will be provided and the bindings can be changed
>>>    to suit the user
>>>
>>> For example
>>>
>>> [P] - <previous-ambiguous>
>>>
>>> [N] - <next-ambiguous>
>>>
>>> [F] - <forward-word>
>>>
>>> [B] - <back-word>
>>>
>>> [1], [2],[3],[4] for choosing the correct lexical form.
>>>
>>
>>
>> Cool functions, all very useful for the end-user of this tool. I think
>> that one more function (if possible) would be really useful: select one or
>> more words and see the output of the morphological analyser for them and be
>> able to change the tagged corpus according to the morphological output.
>> This will help a lot to re-tag the corpus to make it consistent with
>> changes in dictionaries.
>>
>>
> I have added two new features with respect to making the corpus consistent
> with changes in the dictionaries.
> 1> A selection of one or more words can be re-analysed by the
> morphological analyser and can be re-tagged as usual for an ambigious
> lexical form
> 2> On reopening a previously analyzed corpus, the tool checks whether each
> of the tags is present in the current analysis of the word and it will also
> pop an alert if the corpus has got unaligned due to the addition of a
> multiword in the dictionary
>
>
>> *Evaluating the tagger*
>>>
>>> Functions
>>>
>>>
>>>    - The trained tagger can be evaluated immediately by having an
>>>    option of setting aside x% of the corpus as testing data.
>>>    - Else, it can be evaluated using the .prob evaluator using an
>>>    unrelated corpus.
>>>
>>> That's fine. Many times what we want to do is to compare two .probs to
>> see if the "improved" one is doing better than the "old" one. We should
>> maybe provide some way to make it easy: test with two .probs and show
>> differences to the user.
>>
>>
> I have addressed by allowing multiple .prob files to be evaluated on the
> same corpus, which will provide comparisons after evaluating them.
>
>>  *Loading the corpus*
>>>
>>> The available options are:
>>>
>>>    1.
>>>
>>>    Load a raw-text file, morphological analyser and .tsx file (optional)
>>>    2.
>>>
>>>    Continue on an existing project
>>>    3.
>>>
>>>    Pull a wiki-dump and use it as the corpus [
>>>    http://i.imgur.com/F9OXMs4.png ]
>>>
>>>
>>>
>>> I think we should add something about the language. The user should be
>> able to define the language of the raw-text file or of the wikipedia dump,
>> then have a list of available pairs or languages (if any) to create the
>> first tagged but non-disambiguated version of the corpus.
>>
>
>
>
> A dialog for choosing the morphological analyser will show up after
> selecting one of the  options which will list the available language pairs.
>
>>
>>>    1.
>>>
>>>
>>> *b) .prob evaluator*
>>>
>>> Mockup: http://i.imgur.com/fIo6rV9.png
>>>
>>> Functions
>>>
>>>    - Input the .prob file , the manually disambiguated corpus along
>>>    with morphologically analysed corpus or the morphological analyser for 
>>> the
>>>    language.
>>>    - Evaluate the .prob file and display statistics about tagger
>>>    accuracy
>>>    - Generate a log file, which will basically be the diff between the
>>>    provided  tagged corpus and the corpus disambiguated by the tagger, 
>>> making
>>>    it easier to frame new  sentences to add to the corpus, so as to give 
>>> more
>>>    context to the tagger
>>>
>>>
>> Great, see my comment about comparing two .probs above.
>>
>>
>>>
>>>
>>> *c) .tsx file editor*
>>>
>>> Mockup:
>>>
>>> TSX Viewer
>>>
>>> http://i.imgur.com/pVdsIem.png
>>>
>>> Templates
>>>
>>> http://i.imgur.com/hFGIQHR.png
>>>
>>> Functions:
>>>
>>>    -
>>>
>>>    Add new tags
>>>    - categories
>>>       - multi-categories
>>>       - forbid
>>>       - enforce
>>>       - prefer
>>>    - Templates for adding new tags
>>>    - Change the order of the tags (as more specific categories must be
>>>    defined before more general ones) within the same parent tag. The nodes 
>>> in
>>>    the xml viewer can also be made draggable within the same parent node  to
>>>    make it easier to change the order
>>>    - Search within tags for faster navigation.
>>>    - Validate the tagger definition
>>>    - Editor features  like syntax highlighting , auto-indentation and
>>>    tag completion for manual editing in the Node Contents textview for 
>>> complex
>>>     in-place editing.
>>>
>>>
>>> Pretty good. A more human-oriented representation of the information
>> (symbols meaning) could be also a good thing to have in this interface. And
>> if we can produce examples that represent what a coarse tag groups or what
>> a forbid, enforce or prefer rule is doing, that would be brilliant as well.
>>
>>
> I have addressed this by adding a simple help button to the template
> interface, which will give information about the currently selected tag.
>
>
>>
>>
>>> Looking forward to hear from you. :)
>>>
>>>
>>>
>> Sorry again with the delay and don't forget about adding your proposal.
>>
>
>
> Thanks for the feedback. I have updated the changes in my proposal at h
> ttp://wiki.apertium.org/wiki/User:Mihirrege/GSOC_2013_Application_-_Interface_for_creating_tagged_corpora<http://wiki.apertium.org/wiki/User:Mihirrege/GSOC_2013_Application_-_Interface_for_creating_tagged_corpora>
> .
> If possible can you go through it and review it for me?
>
> Regards,
> Mihir Rege
>
>>  Regards,
>>>
>>
>> Best,
>>
>> Gema Ramírez.
>>
>>
>>>  Mihir Rege,
>>>
>>> Second Year Undergraduate,
>>>
>>> Department of Computer Science and Engineering,
>>>
>>> IIT Kharagpur.
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Try New Relic Now & We'll Send You this Cool Shirt
>>> New Relic is the only SaaS-based application performance monitoring
>>> service
>>> that delivers powerful full stack analytics. Optimize and monitor your
>>> browser, app, & servers with just a few lines of code. Try New Relic
>>> and get this awesome Nerd Life shirt!
>>> http://p.sf.net/sfu/newrelic_d2d_apr
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>
>>
>> --
>> Gema Ramírez
>> ---------------------
>> Prompsit LE
>> Traduce, extrae, analiza: http://aplica.prompsit.com
>>
>>
>> ------------------------------------------------------------------------------
>> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
>> Get 100% visibility into your production application - at no cost.
>> Code-level diagnostics for performance bottlenecks with <2% overhead
>> Download for free and get started troubleshooting in minutes.
>> http://p.sf.net/sfu/appdyn_d2d_ap1
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>
>
> --
> Mihir Rege
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>


-- 
Gema Ramírez
---------------------
Prompsit LE
Traduce, extrae, analiza: http://aplica.prompsit.com
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to