Hey everybody. After 10 days mostly in the nature without a computer and just before 8 other weeks without a permanent internet connexion (widely chosen), I want to give my opinion as a new pair developer about the discussion about what should countain dictionaries.
1) For monodices, I perfectly agree with Fran and some others to think all interesting information should be there even if not used for several pairs. As doing that generally means to write a complete paradigm, and after just to use it hundred or thousand of times for the main ones, it is not a big problem. 2) For bidixes, the most natural way to build them is to write something like : <e><p><l>my_word<s n="kind1"/></l><r>my_translation<s n="kind2"/></r></p></e> where kind1 and kind2 are often the same and can be built from the name of the paradigm used in the monodix. I tell that because I quickly realised that including a new line typing the right xml syntax in a file with more 40 000 other lines becomes quickly painful. So I wrote a 4 parameter shell to generate new lines, and another to put these lines at the good place. I think a lot of pair developers have their own shell to do the same or something similar to build a bidix when monodices are available. So, making bidixes lines like as above means other <s n="something"/> would be better if not needed. Of course, there are exceptions witch permit to get pleasant results like in fr-es pair : <e><p><l>coma<s n="n"/><s n="m"/></l><r>coma<s n="n"/><s n="m"/></r></p></e> <e><p><l>virgule<s n="n"/><s n="f"/></l><r>coma<s n="n"/><s n="f"/></r></p></e> or <e><p><l>composant<s n="n"/><s n="m"/></l><r>componente<s n="n"/><s n="m"/></r></p></e> <e><p><l>composante<s n="n"/><s n="f"/></l><r>componente<s n="n"/><s n="f"/></r></p></e> But having to write (in eo-fr pair) <e><p><l>ABC<s n="np"/><s n="al"/></l><r>ABC<s n="np"/><s n="al"/><s n="mf"/></r></p></e> without forgeting any <s n="al"/> or the <s n="mf"/> to prevent getting a # in the translation, is not a very nice way to work. There is of course the problem of the beginner not doing that and asking on the list why it does not work. But that can be learned quickly. But the most important problem is being obliged to do that quite allways and finaly having bigger and a little less readable lines in the bidix. I think event in this case : <e><p><l>ajout<s n="n"/><s n="m"/></l><r>adición<s n="n"/><s n="f"/></r></p></e>(gender changing), there should be no need to give gender if there is no word ambiguity in each langage (like for coma and componente in Spanish). And of course something like : <e r="LR"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s n="adj"/><s n="GD"/></r></p></e> <e r="RL"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s n="adj"/><s n="f"/></r></p></e> <e r="RL"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s n="adj"/><s n="m"/></r></p></e> would become more simple in one line. So, the question is how to succeed to do that without breaking things. Solution 1 : paradigm Several people spoke about it but without details. I remark the information <s n="kind"/> inside bidixes can generally be generated from the name of the paradigm used in the monodix witch looks like "something__kind" (or "foo__bar" if you prefer). But of course, there is les information in "kind" than in "something__kind". So a nice approach woud be for each paradigm of every monodix, to build a paradigm with the same name in the bidix just countaining an invariant list of informations like : <s n="thing1"/><s n="thing2"/> And like that, even gender ambiguities like for the Spanish word coma could be solved elegantly : <e><p><l>coma<s n="livre__n"/></l><r>coma<s n="abismo__n"/></r></p></e> <e><p><l>virgule<s n="abeille__n"/></l><r>coma<s n="abeja__n"/></r></p></e> Solution 2 : during compilation That's another approch. For compiling bidixes files, two cases : - an information is in a <s n="thing"/> , so just use it - this information is not indicated, so it is taken from the monodix. Have a good summer. -------------------------------- Bernard Chardonneau (France) Phone : [33] 1 64 90 87 04 (from Sept to June except holidays) GSM phone : [33] 6 49 95 13 95 (french scholl holidays, C zone) Multilingual websites for my free softwares : http://libremail.free.fr and http://libremail.tuxfamily.org http://cyloop.tuxfamily.org (mainly translated with Apertium) My general website (in french only) http://bech.free.fr ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
