Hey everybody.

After 10 days mostly in the nature without a computer and just before
8 other weeks without a permanent internet connexion (widely chosen),
I want to give my opinion as a new pair developer about the discussion
about what should countain dictionaries.

1) For monodices, I perfectly agree with Fran and some others to think
all interesting information should be there even if not used for several
pairs.

As doing that generally means to write a complete paradigm, and after
just to use it hundred or thousand of times for the main ones, it is
not a big problem.

2) For bidixes, the most natural way to build them is to write something
like :

<e><p><l>my_word<s n="kind1"/></l><r>my_translation<s n="kind2"/></r></p></e>

where kind1 and kind2 are often the same and can be built from the
name of the paradigm used in the monodix.

I tell that because I quickly realised that including a new line
typing the right xml syntax in a file with more 40 000 other lines
becomes quickly painful.
So I wrote a 4 parameter shell to generate new lines, and another
to put these lines at the good place.

I think a lot of pair developers have their own shell to do the
same or something similar to build a bidix when monodices are
available.

So, making bidixes lines like as above means other <s n="something"/>
would be better if not needed.

Of course, there are exceptions witch permit to get pleasant results
like in fr-es pair :

<e><p><l>coma<s n="n"/><s n="m"/></l><r>coma<s n="n"/><s n="m"/></r></p></e>
<e><p><l>virgule<s n="n"/><s n="f"/></l><r>coma<s n="n"/><s n="f"/></r></p></e>

or

<e><p><l>composant<s n="n"/><s n="m"/></l><r>componente<s n="n"/><s 
n="m"/></r></p></e>
<e><p><l>composante<s n="n"/><s n="f"/></l><r>componente<s n="n"/><s 
n="f"/></r></p></e>

But having to write (in eo-fr pair)
<e><p><l>ABC<s n="np"/><s n="al"/></l><r>ABC<s n="np"/><s n="al"/><s 
n="mf"/></r></p></e>
without forgeting any <s n="al"/> or the <s n="mf"/> to prevent
getting a # in the translation, is not a very nice way to work.

There is of course the problem of the beginner not doing that and
asking on the list why it does not work. But that can be learned
quickly.

But the most important problem is being obliged to do that quite
allways and finaly having bigger and a little less readable lines
in the bidix.

I think event in this case :
<e><p><l>ajout<s n="n"/><s n="m"/></l><r>adición<s n="n"/><s 
n="f"/></r></p></e>(gender changing), there should be no need to give gender if 
there
is no word ambiguity in each langage (like for coma and componente
in Spanish).

And of course something like :
<e r="LR"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s n="adj"/><s 
n="GD"/></r></p></e>
<e r="RL"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s n="adj"/><s 
n="f"/></r></p></e>
<e r="RL"><p><l>binaire<s n="adj"/><s n="mf"/></l><r>binario<s n="adj"/><s 
n="m"/></r></p></e>

would become more simple in one line.

So, the question is how to succeed to do that without breaking things.


Solution 1 : paradigm

Several people spoke about it but without details.
I remark the information <s n="kind"/> inside bidixes can generally
be generated from the name of the paradigm used in the monodix
witch looks like "something__kind" (or "foo__bar" if you prefer).

But of course, there is les information in "kind" than in
"something__kind".

So a nice approach woud be for each paradigm of every monodix, to
build a paradigm with the same name in the bidix just countaining
an invariant list of informations like :

<s n="thing1"/><s n="thing2"/>

And like that, even gender ambiguities like for the Spanish word
coma could be solved elegantly :

<e><p><l>coma<s n="livre__n"/></l><r>coma<s n="abismo__n"/></r></p></e>
<e><p><l>virgule<s n="abeille__n"/></l><r>coma<s n="abeja__n"/></r></p></e>


Solution 2 : during compilation

That's another approch. For compiling bidixes files, two cases :
- an information is in a <s n="thing"/> , so just use it
- this information is not indicated, so it is taken from the
  monodix.


Have a good summer.

--------------------------------
Bernard Chardonneau (France)
Phone : [33] 1 64 90 87 04 (from Sept to June except holidays)
GSM phone : [33] 6 49 95 13 95 (french scholl holidays, C zone)

Multilingual websites for my free softwares :
http://libremail.free.fr and http://libremail.tuxfamily.org
http://cyloop.tuxfamily.org (mainly translated with Apertium)

My general website (in french only)
http://bech.free.fr

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to