El dt 09 de 10 de 2012 a les 19:24 +0200, en/na [email protected] va
escriure:
> On Tue, Oct 09, 2012 at 02:14:42PM +0000, Francis Tyers wrote:
> > El dt 09 de 10 de 2012 a les 15:14 +0200, en/na [email protected] va
> > escriure:
> > > On Tue, Oct 09, 2012 at 09:41:41AM +0200, Per Tunedal wrote:
> > 
> > As a first pass, I would try adding semantic information in a new
> > module. It is the easiest way to not step on anyone's toes. If you make
> > something that works, and we have a language pair that can make use of
> > it, then we can see how to integrate it.
> 
> Hmm, I am not sure how to read this. Did you mean "Fran" when you wrote "I 
> will try",
> or a more impersonal person (could be myself...) First I read it as "Fran" 
> and I was very happy,
> but with more careful and pessimistic eyes it could be read as the latter.

As I mentioned, I'm not interested in using WordNet as they don't exist
for most languages. I'm interested in methods that can be applied to any
language.

So yes, it was an impersonal "I would" ;) 

> Anyway, I agree with you that a module would be the way forward.
> And I would happily contribute and experiment and write code and
> data once I know what to do. I would very much appreciate some intitial help.

Here is what I would do:

* Take the Spanish--English language pair
* Extract words from Spanish->English from the bilingual dictionary.

$ lt-comp rl apertium-en-es.en-es.dix es-en-ambig.bin
$ lt-expand apertium-en-es.en-es.dix | grep -v ':>:' | sed 's/:<:/:/g' |
cut -f2 -d':' | sed 's/^/^/g' | sed 's/$/$/g' | lt-proc -b
es-en-ambig.bin   > es-en-ambig.txt

Example output:

^política<n><f>/politics<n>/policy<n>$
^contaminación<n><f>/contamination<n>/pollution<n>$
^cerdo<n><m>/pig<n>/pork<n>$
^puerto<n><m>/port<n>/haven<n>$
^retrato<n><m>/portrait<n>/portrayal<n>$

* Make a text file which maps Spanish senses to English translations.

I just looked up: "política":

http://adimen.si.ehu.es/cgi-bin/wei/public/wei.consult.perl?item=pol%C3%
ADtica&button1=Look_up&metode=Word&pos=Nouns&llengua=Spanish_3.0&search=nearest&estructura=English_3.0&glos=Gloss&levin=1&eng-30=English_3.0

And the synsets seem to be: 

"policy": eng-30-06656408-n eng-30-05901508-n
"politics": eng-30-13840719-n eng-30-06148148-n eng-30-00611972-n

So perhaps your text file could look like:

^política<n><f>/politics<n>$ | eng-30-13840719-n eng-30-06148148-n
eng-30-00611972-n
^política<n><f>/policy<n>$ | eng-30-06656408-n eng-30-05901508-n

Then I would write a module which reads the output of lexical
transfer...

$ echo "Según los monetaristas, el banco central puede aumentar la
inversión y el consumo si aplica esta política y baja la tasa de
interés. " | apertium -d . es-en-pretransfer | lt-proc -b
es-en-ambig.bin 
^Según<pr>/According to<pr>$ ^el<det><def><m><pl>/the<det><def><m><pl>$
^monetarista<adj><mf><pl>/monetarist<adj><pl>$^,<cm>/,<cm>$
^el<det><def><m><sg>/the<det><def><m><sg>$ ^banco<n><m><sg>/bank<n><sg>$
^central<adj><mf><sg>/central<adj><sg>$
^poder<vbmod><pri><p3><sg>/can<vaux><pri><p3><sg>$
^aumentar<vblex><inf>/augment<vblex><inf>/increase<vblex><inf>/magnify<vblex><inf>/heighten<vblex><inf>/rise<vblex><inf>/hike#
 up<vblex><sep><inf>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^inversión<n><f><sg>/investment<n><sg>$ ^y<cnjcoo>/and<cnjcoo>$ 
^el<det><def><m><sg>/the<det><def><m><sg>$ 
^consumo<n><m><sg>/consumption<n><sg>$ ^si<cnjadv>/if<cnjadv>$ 
^aplicar<vblex><pri><p3><sg>/apply<vblex><pri><p3><sg>$ 
^este<det><dem><f><sg>/this<det><dem><f><sg>$ 
^política<n><f><sg>/politics<n><sg>/policy<n><sg>$ ^y<cnjcoo>/and<cnjcoo>$ 
^baja<n><f><sg>/drop<n><sg>$ ^el<det><def><f><sg>/the<det><def><f><sg>$ 
^tasa<n><f><sg>/tax<n><sg>/rate<n><sg>$ ^de<pr>/of<pr>/from<pr>$ 
^interés<n><m><sg>/interest<n><sg>$^.<sent>/.<sent>$^.<sent>/.<sent>$ 

And do your algorithm on it. 

Good luck!

> > > > Maybe there are similar algos around for related applications, like
> > > > spell checking, statistical translation (tree model or even factored
> > > > translation, look at Moses), speech recognition or even artificial
> > > > intelligence? And you mentioned finding the shortest way. Someone might
> > > > have an idea of where to look for algos? There might be some open source
> > > > code to copy or be inspired by.
> > 
> > http://ixa2.si.ehu.es/ukb/
> 
> Thanks. Well, it is not just homonyms, I see.
> And the article does report significant improvements.
> 
> > > > BTW Just like you I'm into this just for the fun of it. I will only work
> > > > with things that are of great interests to me. Primarily, I like to
> > > > solve problems. Or help others to solve theirs.
> > > 
> > > Yes, agree. I think that also discussing on the list without doing
> > > commits is contributing to the Apertium project.
> > > And I would like to contribute more (I have done some committs already)
> > > but I am stuck with committs because I am not getting
> > > the advice from more seasoned people, that I am in my limited 
> > > understanding
> > > thinking that I need guidance on, to not hurt the overall system
> > > or the specific language pair I am working on, or to not violate Apertium
> > > design principles.
> > 
> > Here is what I think:
> > 
> > * For Swedish-Danish this will be unnecessary.
> 
> Why? I think there is enough difference between the two languages to try it 
> out.

I think there aren't enough problems of lexical selection to make it a
worthwhile pursuit compared to (a) improving dictionary coverage, (b)
improving morphological disambiguation.

Fran


------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to