Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns

Francis Tyers Tue, 09 Oct 2012 14:21:08 -0700

El dt 09 de 10 de 2012 a les 23:16 +0200, en/na [email protected] va
escriure:
> On Tue, Oct 09, 2012 at 06:18:30PM +0000, Francis Tyers wrote:
> > El dt 09 de 10 de 2012 a les 20:07 +0200, en/na [email protected] va
> > escriure:
> > > On Tue, Oct 09, 2012 at 05:44:54PM +0000, Francis Tyers wrote:
> > > > El dt 09 de 10 de 2012 a les 19:24 +0200, en/na [email protected] va
> > > > escriure:
> > > > > On Tue, Oct 09, 2012 at 02:14:42PM +0000, Francis Tyers wrote:
> > > > > > El dt 09 de 10 de 2012 a les 15:14 +0200, en/na [email protected] va
> > > > > > escriure:
> > > > > > > On Tue, Oct 09, 2012 at 09:41:41AM +0200, Per Tunedal wrote:
> > > > > > 
> > > > > > As a first pass, I would try adding semantic information in a new
> > > > > > module. It is the easiest way to not step on anyone's toes. If you 
> > > > > > make
> > > > > > something that works, and we have a language pair that can make use 
> > > > > > of
> > > > > > it, then we can see how to integrate it.
> > > > > 
> > > > > Hmm, I am not sure how to read this. Did you mean "Fran" when you 
> > > > > wrote "I will try",
> > > > > or a more impersonal person (could be myself...) First I read it as 
> > > > > "Fran" and I was very happy,
> > > > > but with more careful and pessimistic eyes it could be read as the 
> > > > > latter.
> > > > 
> > > > As I mentioned, I'm not interested in using WordNet as they don't exist
> > > > for most languages. I'm interested in methods that can be applied to any
> > > > language.
> > > > 
> > > > So yes, it was an impersonal "I would" ;) 
> > > 
> > > :-( Anyway, I hope you and others can guide or even help doing some 
> > > initial steps.
> > > 
> > > > > Anyway, I agree with you that a module would be the way forward.
> > > > > And I would happily contribute and experiment and write code and
> > > > > data once I know what to do. I would very much appreciate some 
> > > > > intitial help.
> > > > 
> > > > Here is what I would do:
> > > > 
> > > > * Take the Spanish--English language pair
> > > > * Extract words from Spanish->English from the bilingual dictionary.
> > > 
> > > Thanks for the outline. However I have only very little knowledge of 
> > > Spanish, so 
> > > I don't think I can contribute here.
> > 
> > Then do the magic and replace "Spanish" with "Swedish" and "English"
> > with "Danish". It will work the same way.
> 
> 
> yes, I was aware of that.
> 
> > > (snip)
> > > 
> > > > 
> > > > And do your algorithm on it. 
> > > 
> > > Weer should I build the algorithm? In a standalone module, or in some API?
> > > What would be the hooks? I surely need to be able to get access to the 
> > > monodix'es and bidix
> > > in some database form?
> > 
> > No API, no databases. 
> > 
> > Do it in a standalone module. You have access to the output of the
> > lexical-transfer stage as I described in my previous email. This is all
> > the information you will need. If you want to get the information from
> > the bidix and monodix in a text-format, you can use the "lt-expand"
> > tool. 
> 
> OK, I seem to rememer using this before. Lots of data...


$ lt-expand apertium-sv-da.sv.dix > /tmp/blah ; ls -lsrth /tmp/blah
1,4M -rw-r--r-- 1 fran fran 1,4M oct  9 21:20 /tmp/blah

$ lt-expand apertium-sv-da.da.dix > /tmp/blah ; ls -lsrth /tmp/blah
3,2M -rw-r--r-- 1 fran fran 3,2M oct  9 21:20 /tmp/blah

If your algorithm can't cope with 5M of data, then I think it probably
has a fundamental design problem.

> > > > > > * For Swedish-Danish this will be unnecessary.
> > > > > 
> > > > > Why? I think there is enough difference between the two languages to 
> > > > > try it out.
> > > > 
> > > > I think there aren't enough problems of lexical selection to make it a
> > > > worthwhile pursuit compared to (a) improving dictionary coverage, (b)
> > > > improving morphological disambiguation.
> > > 
> > > The case is that I would like to do more things in one go.
> > 
> > Yes, this means that it will never get done. Remember the Unix
> > philosophy. Do one thing, and do it well.[1] And /worse is better/.[2]
> > 
> > > I do not want to update the monodixes once and the then do it once more.
> > 
> > Better to do it twice than to do it 0 times. 
> 
> 
> well, well. I just want some advice before doing something silly, or even 
> damaging.

You can't damage anything so long as you don't commit. If you are
worrying about damaging something and want to commit, then make a
branch.

> > > I have 49000 swedish nouns to add, and I would like to have it added with 
> > > SALDI
> > > links in it.  I risk loosing all coordination between the words and the
> > > meanings if I do it in two steps.
> > 
> > You keep the link using the lexical forms.
> 
> Not understood. How?

A lexical form will give you the same information as including the
information in the monodix. You can use it like I showed before with the
output of lexical transfer.

> > > Would adding the links with a "ref" tag be OK, or what would be 
> > > recommended?
> > > And an "id" tag to record the meaning id?
> > 
> > No. 
> 
> Then what would you recommend if I want to keep all the data togetther in the 
> monodixes?

As a first attempt I would not do that. 

Fran


------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Word selection by sens was: Re: Adding Swedish nouns from SALDO to da-se was: Re: Danish - Swedish Nouns

Reply via email to