Hi Mihir,
thank you for the changes. The proposal in the wiki looks good. Make sure
we will be able to review it by posting on the Google Summer of Code
site<http://www.google-melange.com/gsoc/homepage/google/gsoc2013>!
— if you don't submit them to Google, we can't make the review!
Best,
Gema.
On Fri, May 3, 2013 at 1:12 AM, Mihir Rege <[email protected]> wrote:
> Hi Gema,
>
> Thanks for the inputs. I have stated the changes which I have incorporated
> taking them into consideration inline
>
> On Tue, Apr 30, 2013 at 4:26 PM, Gema Ramírez-Sánchez
> <[email protected]>wrote:
>
>> Dear Mihir,
>>
>> thank you for your effort, see my comments inline:
>>
>>
>> On Sun, Apr 28, 2013 at 6:17 PM, Mihir Rege <[email protected]> wrote:
>>
>>> Hi everybody,
>>>
>>> I have been working on the idea for creating an interface for tagged
>>> corpora since the past few days. I have finished the coding challenge, have
>>> gone through the documentation pdf and wikis and am currently working on
>>> designing the interface. I am posting the mockups for the interface so
>>> that I can get your opinion and improve on them.
>>>
>>
>> Great!
>>
>>
>>> There are currently three major interfaces:
>>>
>>> a) Manual disambiguator
>>>
>>> b) .prob evaluator
>>>
>>> c) .tsx file editor
>>>
>>> I have put up all the mockups together at
>>> http://imgur.com/a/4uk4q#r5Ur8jT and have also put links in separate
>>> sections.
>>>
>>>
>>>
>> Let's take a look, I like the idea of having access to all possible tasks
>> that apply to work with taggers in an interface, it is difficult to
>> organise because the process can be non-linear but still, a good idea.
>>
>>
>>> *a) Manual disambiguator*
>>>
>>> Mockup: http://i.imgur.com/r5Ur8jT.png
>>>
>>> Functions:
>>>
>>> - Jump to next ambiguous lexical unit or adjacent lexical-unit using
>>> the keyboard or mouse.
>>> - A quick-view bound to a key, to hide the tags and show the raw text
>>> - If the .tsx file is provided, information like the coarse tags,
>>> forbid, enforce rules applicable can also be displayed.
>>> - Show statistics of disambiguation
>>> - Compile and apply constraint grammar rules to the buffer
>>> - List the applied constraint grammar rules
>>> - Train and test the tagger (a prompt will ask the part of the
>>> corpus to be used as testing data).
>>> - Train the tagger and export the .prob file
>>> - Save progress ( this will save the corpus and also create a
>>> project description file which will keep track of the morphological
>>> analyser, .tsx files used, so that it is easier to resume tagging)
>>> - The interface will be keyboard centric, though it will be equally
>>> functional with a mouse.
>>> - Default keymaps will be provided and the bindings can be changed
>>> to suit the user
>>>
>>> For example
>>>
>>> [P] - <previous-ambiguous>
>>>
>>> [N] - <next-ambiguous>
>>>
>>> [F] - <forward-word>
>>>
>>> [B] - <back-word>
>>>
>>> [1], [2],[3],[4] for choosing the correct lexical form.
>>>
>>
>>
>> Cool functions, all very useful for the end-user of this tool. I think
>> that one more function (if possible) would be really useful: select one or
>> more words and see the output of the morphological analyser for them and be
>> able to change the tagged corpus according to the morphological output.
>> This will help a lot to re-tag the corpus to make it consistent with
>> changes in dictionaries.
>>
>>
> I have added two new features with respect to making the corpus consistent
> with changes in the dictionaries.
> 1> A selection of one or more words can be re-analysed by the
> morphological analyser and can be re-tagged as usual for an ambigious
> lexical form
> 2> On reopening a previously analyzed corpus, the tool checks whether each
> of the tags is present in the current analysis of the word and it will also
> pop an alert if the corpus has got unaligned due to the addition of a
> multiword in the dictionary
>
>
>> *Evaluating the tagger*
>>>
>>> Functions
>>>
>>>
>>> - The trained tagger can be evaluated immediately by having an
>>> option of setting aside x% of the corpus as testing data.
>>> - Else, it can be evaluated using the .prob evaluator using an
>>> unrelated corpus.
>>>
>>> That's fine. Many times what we want to do is to compare two .probs to
>> see if the "improved" one is doing better than the "old" one. We should
>> maybe provide some way to make it easy: test with two .probs and show
>> differences to the user.
>>
>>
> I have addressed by allowing multiple .prob files to be evaluated on the
> same corpus, which will provide comparisons after evaluating them.
>
>> *Loading the corpus*
>>>
>>> The available options are:
>>>
>>> 1.
>>>
>>> Load a raw-text file, morphological analyser and .tsx file (optional)
>>> 2.
>>>
>>> Continue on an existing project
>>> 3.
>>>
>>> Pull a wiki-dump and use it as the corpus [
>>> http://i.imgur.com/F9OXMs4.png ]
>>>
>>>
>>>
>>> I think we should add something about the language. The user should be
>> able to define the language of the raw-text file or of the wikipedia dump,
>> then have a list of available pairs or languages (if any) to create the
>> first tagged but non-disambiguated version of the corpus.
>>
>
>
>
> A dialog for choosing the morphological analyser will show up after
> selecting one of the options which will list the available language pairs.
>
>>
>>> 1.
>>>
>>>
>>> *b) .prob evaluator*
>>>
>>> Mockup: http://i.imgur.com/fIo6rV9.png
>>>
>>> Functions
>>>
>>> - Input the .prob file , the manually disambiguated corpus along
>>> with morphologically analysed corpus or the morphological analyser for
>>> the
>>> language.
>>> - Evaluate the .prob file and display statistics about tagger
>>> accuracy
>>> - Generate a log file, which will basically be the diff between the
>>> provided tagged corpus and the corpus disambiguated by the tagger,
>>> making
>>> it easier to frame new sentences to add to the corpus, so as to give
>>> more
>>> context to the tagger
>>>
>>>
>> Great, see my comment about comparing two .probs above.
>>
>>
>>>
>>>
>>> *c) .tsx file editor*
>>>
>>> Mockup:
>>>
>>> TSX Viewer
>>>
>>> http://i.imgur.com/pVdsIem.png
>>>
>>> Templates
>>>
>>> http://i.imgur.com/hFGIQHR.png
>>>
>>> Functions:
>>>
>>> -
>>>
>>> Add new tags
>>> - categories
>>> - multi-categories
>>> - forbid
>>> - enforce
>>> - prefer
>>> - Templates for adding new tags
>>> - Change the order of the tags (as more specific categories must be
>>> defined before more general ones) within the same parent tag. The nodes
>>> in
>>> the xml viewer can also be made draggable within the same parent node to
>>> make it easier to change the order
>>> - Search within tags for faster navigation.
>>> - Validate the tagger definition
>>> - Editor features like syntax highlighting , auto-indentation and
>>> tag completion for manual editing in the Node Contents textview for
>>> complex
>>> in-place editing.
>>>
>>>
>>> Pretty good. A more human-oriented representation of the information
>> (symbols meaning) could be also a good thing to have in this interface. And
>> if we can produce examples that represent what a coarse tag groups or what
>> a forbid, enforce or prefer rule is doing, that would be brilliant as well.
>>
>>
> I have addressed this by adding a simple help button to the template
> interface, which will give information about the currently selected tag.
>
>
>>
>>
>>> Looking forward to hear from you. :)
>>>
>>>
>>>
>> Sorry again with the delay and don't forget about adding your proposal.
>>
>
>
> Thanks for the feedback. I have updated the changes in my proposal at h
> ttp://wiki.apertium.org/wiki/User:Mihirrege/GSOC_2013_Application_-_Interface_for_creating_tagged_corpora<http://wiki.apertium.org/wiki/User:Mihirrege/GSOC_2013_Application_-_Interface_for_creating_tagged_corpora>
> .
> If possible can you go through it and review it for me?
>
> Regards,
> Mihir Rege
>
>> Regards,
>>>
>>
>> Best,
>>
>> Gema Ramírez.
>>
>>
>>> Mihir Rege,
>>>
>>> Second Year Undergraduate,
>>>
>>> Department of Computer Science and Engineering,
>>>
>>> IIT Kharagpur.
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Try New Relic Now & We'll Send You this Cool Shirt
>>> New Relic is the only SaaS-based application performance monitoring
>>> service
>>> that delivers powerful full stack analytics. Optimize and monitor your
>>> browser, app, & servers with just a few lines of code. Try New Relic
>>> and get this awesome Nerd Life shirt!
>>> http://p.sf.net/sfu/newrelic_d2d_apr
>>> _______________________________________________
>>> Apertium-stuff mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>>
>>>
>>
>>
>> --
>> Gema Ramírez
>> ---------------------
>> Prompsit LE
>> Traduce, extrae, analiza: http://aplica.prompsit.com
>>
>>
>> ------------------------------------------------------------------------------
>> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
>> Get 100% visibility into your production application - at no cost.
>> Code-level diagnostics for performance bottlenecks with <2% overhead
>> Download for free and get started troubleshooting in minutes.
>> http://p.sf.net/sfu/appdyn_d2d_ap1
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
>>
>
>
> --
> Mihir Rege
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
--
Gema Ramírez
---------------------
Prompsit LE
Traduce, extrae, analiza: http://aplica.prompsit.com
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff