Hi,
How about you try this:

lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+<n>" | sed -E
's/[^<:>]+:([^<:>]+).*/\1/g' | sed -E 's/\p{No}//g' | uniq

Just a small addition to Daniel's earlier command, to delete all
superscripts before removing duplicates. Hopefully you don't need
superscripts in your lemmas elsewhere. If you do then we can do other
things here.

*Note that I'm not able to reproduce this on my machine.* But I'm not able
to reproduce Daniel's command either so that might just be something to do
with my machine. I'm guessing it should work. Check it out and let me know.

Tanmai

On Thu, Apr 23, 2020 at 1:51 PM Per Tunedal <per.tune...@operamail.com>
wrote:

> Hi Kevin,
> thanks for the explanation. Thus they are homonyms. How do I get rid of
> the duplicates?
> I just want:
>
> tur
>
> Yours,
> Per Tunedal
>
> On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote:
>
> "Per Tunedal" <per.tune...@operamail.com>
> čálii:
>
> > Hi Daniel,
> > Thank you! Works like a charm with a small exception.
> >
> > I get some strange duplicates like e.g. tur:
> >
> > tur¹
> > tur²
>
> slump vs färd, they have different paradigms:
>
> <e c="flaks" lm="tur">  <p><l>tur</l><r>tur¹</r></p><par
> n="mjölk__n_ut"/></e>
> <e c="gåtur" lm="tur">  <p><l>tur</l><r>tur²</r></p><par
> n="film__n_ut"/></e>
>
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
>
> *Attachments:*
>
>    - signature.asc
>
>
> _______________________________________________
> Apertium-stuff mailing list
> Apertium-stuff@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
*Khanna, Tanmai*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to