Re: [Apertium-stuff] GSOC Idea: Take a language pair and make it state of the art

Alex Aruj Mon, 17 Mar 2014 13:22:30 -0700

Hello Fran and list,

Thank you for your responses. Regarding the first topic in my last e-mail
about "visualization" of coverage and quality, I found a graph on the wiki (
http://wiki.apertium.org/wiki/File:Wikipedia-n-zipf.png) that could spark
some ideas about how to illustrate how effective Apertium's language pairs
are, e.g. graphing # of dictionary entries in language pair versus its
average WER per 1000 words.


Some basic questions:

After adding a few entries to two dictionaries *en-es.en-es.dix* and
*en-es.es.dix*,
I added a few words in the en-es pair, recompiled using lt-comp lr for
analyser and rl for generator.
I did a "make" in the apertium-en-es folder

I was able to analyse my new word entry
apertium@apvb:~/apertium-en-es$ echo "diminuto" | lt-proc es.analyser.bin
*^diminuto/diminuto<adj><m><sg>$*

Yet still not able to translate word:
apertium@apvb:~/apertium-en-es$ echo "diminuto" | apertium es-en
**diminuto*

What are the missing steps in order for it to produce a translation (I
entered "tiny" as a suitable translation" and for the purpose of practice)?

Btw, some background information:
I was working on virtualbox, I downloaded the lang pair from svn and I
placed it in /home/apertium/apertium-en-es/ . I compiled without changing
any package configuration/installation prefix,, i.e. it stayed as
usr/local. I checked that the pairs were available usingcommand: apertium -l

I edited the two dictionaries from /home/apertium/apertium-en-es/.

Correct me please if I am supposed to be carrying out changes directly in
files in usr/local or link me to resource on wiki showing the next steps,
then I can move on further with challenge before submitting application.
Though I have not been able to read and absorb all of the MT workshop
sessions, they have been super helpful:
http://wiki.apertium.eu/index.php/Main_Page

Thanks!
Alex


On 15 March 2014 17:44, Francis Tyers <[email protected]> wrote:

> El ds 15 de 03 de 2014 a les 00:47 -0700, en/na Alex Aruj va escriure:
> > Hi all,
> >
> >
> > What might some of the deliverables look like for this task?
> >
> > Deliverable examples:List of words added to improve coverage,
> >  rules added to take into account erroneous target constructions
> > source text and post-edited translations used as reference, as
> > recommended in the coding challenge
> > Could developing code used to graph quality, e.g. the word coverage of
> > language with the WER quality, and another correlation--correct me if
> > this is misguided--between transfer rules and the PWER? I can probably
> > set up some visualization of quality in Octave/Matlab if not available
> > yet.
>
> Don't quite get this...
>
> >
> > From
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Make_a_language_pair_state-of-the-art
> :
> > This will involve working with dictionaries, transfer rules,
> > scripting, corpora.
> >
> > Is scripting using the calls to lt-proc, apertium-eval-translator and
> > other tools in lt-toolbox?
>
> Scripting probably means something like format conversion and data
> generation.
>
> > I am trying to figure out how to run these more verbose scripts
> > lttoolbox scripts to get pos tags and rules at command line, and also
> > save the output to text files and will continue to troubleshoot on
> > IRC, so as not to balloon this e-mail message.
>
> Cool!
>
> > So far, I have evaluated the translation of a ~800 word article from
> > elpais into English and found some issues with future tense ("realizar
> > and some vocab. Though the coverage was well over 90%, the grammar
> > could be much better, but I need to see the tags and rules used at
> > command line. This could be easier for me to do in Apertium-Viewer,
> > but I prefer the control of command line.
>
> Cool :)
>
> > For sure, I would want to save my steps and commands and measures used
> > to improve quality, and develop a mini-wiki to this effect as a step
> > toward getting others to develop their pairs to a competitive level of
> > quality could be nice, if it doesn't already exist!
>
> You could keep notes in a subpage of your userpage on our Wiki :)
>
> Fran
>
>
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>



-- 
Alex

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC Idea: Take a language pair and make it state of the art

Reply via email to