On 29 November 2012 13:38, Miquel Esplà <[email protected]> wrote:
> Hi everybody!
>
> I'm performing some experiments on the dictionaries of the apertium-es-ca
> system. Since I need to remove duplicates of paradigms and entries, I used
> the tool fix in apertium-dixtools. When I did so, I realised that there are
> a lot of duplicate entries in the monolingual dictionaries and most of them
> look like this:
> <e lm="oportunamente" a="prompsit2adn"><i>oportunamente</i><par
> n="ahora__adv"/></e>
>
> <e r="RL" lm="oportunamente" a="prompsit2adn"><i>oportunamente</i><par
> n="ahora__adv"/></e>
>
> As you can see the entries are the same, but there is a RL restriction in
> one of them (which I cannot understand). Fran suggested that this could be
> an error caused by the automatic addition of entries when using a web form.

Web form, or otherwise automated, is a pretty safe assumption.

> So, I wanted to ask you, Is there any reason for which I shouldn't remove
> these entries?
>

With ca.dix and es-ca.dix, you'll have to double check the 'v' attributes:

 <e lm="engolir" a="prompsit"
v="val"><p><l>engol</l><r>engol</r></p><par n="abarat/ir__vblex"/></e>
-<e lm="engolir" a="prompsit"
v="cat"><p><l>engol</l><r>engol</r></p><par n="abarat/ir__vblex"/></e>

That there is an entry being removed here indicates an error - either
one of the entries should have something different, or the v attribute
should be removed.

$ diff -u sort.dix out.dix |grep '^\-.*v='-  <e
v="val"><p><l>egeixi</l>   <r>egir<s n="vblex"/><s n="prs"/><s
n="p3"/><s n="sg"/></r></p></e>
-  <e v="cat"><p><l>eguin</l>    <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="pl"/></r></p></e>
-  <e v="cat"><p><l>eguin</l>    <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="pl"/><j/></r></p><par n="S__anant"/></e>
-  <e v="cat"><p><l>egui</l>     <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="sg"/></r></p></e>
-  <e v="cat"><p><l>egui</l>     <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="sg"/><j/></r></p><par n="S__vagi"/></e>
-  <e v="val"><p><l>eguen</l>    <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="pl"/></r></p></e>
-  <e v="val"><p><l>eguen</l>    <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="pl"/><j/></r></p><par n="S__anant"/></e>
-  <e v="val"><p><l>ega</l>      <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="sg"/></r></p></e>
-  <e v="val"><p><l>ega</l>      <r>eure<s n="vblex"/><s n="imp"/><s
n="p3"/><s n="sg"/><j/></r></p><par n="S__vagi"/></e>
-<e r="RL" lm="fam" v="cat"><i>fam</i><par n="accessibilitat__n"/></e>
-<e lm="gana" v="val">    <i>gan</i><par n="abell/a__n"/></e>
-<e r="RL" lm="poal" v="cat"><p><l>poal</l>   <r>poal</r></p><par
n="abric__n"/></e>
-<e lm="rajola" v="cat">  <i>rajol</i><par n="abell/a__n"/></e>
-<e lm="tenda" a="prompsit2uoc" v="cat"><i>tend</i><par n="abell/a__n"/></e>
-<e r="RL" lm="timó" v="val"><i>tim</i><par n="aband/ó__n"/></e>
-<e r="RL" lm="banyar" v="val"><i>bany</i><par n="abander/ar__vblex"/></e>
-<e lm="desbaratar" v="cat"><i>desbarat</i><par n="abander/ar__vblex"/></e>
-<e lm="engolir" a="prompsit"
v="cat"><p><l>engol</l><r>engol</r></p><par n="abarat/ir__vblex"/></e>
-<e lm="patir" v="cat">   <i>pat</i><par n="abarat/ir__vblex"/></e>

...all of these entries (and their fellow) will need to be double
checked and fixed before 'fix' can be run. I would suggest consulting
Gema about them.

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts 
and peers. http://goparallel.sourceforge.net
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to