On Tue, Oct 09, 2012 at 05:44:54PM +0000, Francis Tyers wrote: > El dt 09 de 10 de 2012 a les 19:24 +0200, en/na [email protected] va > escriure: > > On Tue, Oct 09, 2012 at 02:14:42PM +0000, Francis Tyers wrote: > > > El dt 09 de 10 de 2012 a les 15:14 +0200, en/na [email protected] va > > > escriure: > > > > On Tue, Oct 09, 2012 at 09:41:41AM +0200, Per Tunedal wrote: > > > > > > As a first pass, I would try adding semantic information in a new > > > module. It is the easiest way to not step on anyone's toes. If you make > > > something that works, and we have a language pair that can make use of > > > it, then we can see how to integrate it. > > > > Hmm, I am not sure how to read this. Did you mean "Fran" when you wrote "I > > will try", > > or a more impersonal person (could be myself...) First I read it as "Fran" > > and I was very happy, > > but with more careful and pessimistic eyes it could be read as the latter. > > As I mentioned, I'm not interested in using WordNet as they don't exist > for most languages. I'm interested in methods that can be applied to any > language. > > So yes, it was an impersonal "I would" ;)
:-( Anyway, I hope you and others can guide or even help doing some initial steps. > > Anyway, I agree with you that a module would be the way forward. > > And I would happily contribute and experiment and write code and > > data once I know what to do. I would very much appreciate some intitial > > help. > > Here is what I would do: > > * Take the Spanish--English language pair > * Extract words from Spanish->English from the bilingual dictionary. Thanks for the outline. However I have only very little knowledge of Spanish, so I don't think I can contribute here. (snip) > > And do your algorithm on it. Weer should I build the algorithm? In a standalone module, or in some API? What would be the hooks? I surely need to be able to get access to the monodix'es and bidix in some database form? > > > * For Swedish-Danish this will be unnecessary. > > > > Why? I think there is enough difference between the two languages to try it > > out. > > I think there aren't enough problems of lexical selection to make it a > worthwhile pursuit compared to (a) improving dictionary coverage, (b) > improving morphological disambiguation. The case is that I would like to dot more things in one go. I do not want to update the monodixes once and the then do it once more. I have 49000 swedish nouns to add, and I would like to have it added with SALDI links in it. I risk loosing all coordination between the words and the meanings if I do it in two steps. Would adding the links with a "ref" tag be OK, or what would be recommended? And an "id" tag to record the meaning id? best regards keld ------------------------------------------------------------------------------ Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
