On 29 November 2012 13:38, Miquel Esplà <[email protected]> wrote: > Hi everybody! > > I'm performing some experiments on the dictionaries of the apertium-es-ca > system. Since I need to remove duplicates of paradigms and entries, I used > the tool fix in apertium-dixtools. When I did so, I realised that there are > a lot of duplicate entries in the monolingual dictionaries and most of them > look like this: > <e lm="oportunamente" a="prompsit2adn"><i>oportunamente</i><par > n="ahora__adv"/></e> > > <e r="RL" lm="oportunamente" a="prompsit2adn"><i>oportunamente</i><par > n="ahora__adv"/></e> > > As you can see the entries are the same, but there is a RL restriction in > one of them (which I cannot understand). Fran suggested that this could be > an error caused by the automatic addition of entries when using a web form.
Web form, or otherwise automated, is a pretty safe assumption. > So, I wanted to ask you, Is there any reason for which I shouldn't remove > these entries? > With ca.dix and es-ca.dix, you'll have to double check the 'v' attributes: <e lm="engolir" a="prompsit" v="val"><p><l>engol</l><r>engol</r></p><par n="abarat/ir__vblex"/></e> -<e lm="engolir" a="prompsit" v="cat"><p><l>engol</l><r>engol</r></p><par n="abarat/ir__vblex"/></e> That there is an entry being removed here indicates an error - either one of the entries should have something different, or the v attribute should be removed. $ diff -u sort.dix out.dix |grep '^\-.*v='- <e v="val"><p><l>egeixi</l> <r>egir<s n="vblex"/><s n="prs"/><s n="p3"/><s n="sg"/></r></p></e> - <e v="cat"><p><l>eguin</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="pl"/></r></p></e> - <e v="cat"><p><l>eguin</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="pl"/><j/></r></p><par n="S__anant"/></e> - <e v="cat"><p><l>egui</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="sg"/></r></p></e> - <e v="cat"><p><l>egui</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="sg"/><j/></r></p><par n="S__vagi"/></e> - <e v="val"><p><l>eguen</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="pl"/></r></p></e> - <e v="val"><p><l>eguen</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="pl"/><j/></r></p><par n="S__anant"/></e> - <e v="val"><p><l>ega</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="sg"/></r></p></e> - <e v="val"><p><l>ega</l> <r>eure<s n="vblex"/><s n="imp"/><s n="p3"/><s n="sg"/><j/></r></p><par n="S__vagi"/></e> -<e r="RL" lm="fam" v="cat"><i>fam</i><par n="accessibilitat__n"/></e> -<e lm="gana" v="val"> <i>gan</i><par n="abell/a__n"/></e> -<e r="RL" lm="poal" v="cat"><p><l>poal</l> <r>poal</r></p><par n="abric__n"/></e> -<e lm="rajola" v="cat"> <i>rajol</i><par n="abell/a__n"/></e> -<e lm="tenda" a="prompsit2uoc" v="cat"><i>tend</i><par n="abell/a__n"/></e> -<e r="RL" lm="timó" v="val"><i>tim</i><par n="aband/ó__n"/></e> -<e r="RL" lm="banyar" v="val"><i>bany</i><par n="abander/ar__vblex"/></e> -<e lm="desbaratar" v="cat"><i>desbarat</i><par n="abander/ar__vblex"/></e> -<e lm="engolir" a="prompsit" v="cat"><p><l>engol</l><r>engol</r></p><par n="abarat/ir__vblex"/></e> -<e lm="patir" v="cat"> <i>pat</i><par n="abarat/ir__vblex"/></e> ...all of these entries (and their fellow) will need to be double checked and fixed before 'fix' can be run. I would suggest consulting Gema about them. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Keep yourself connected to Go Parallel: VERIFY Test and improve your parallel project with help from experts and peers. http://goparallel.sourceforge.net _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
