Hi, How about you try this: lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+<n>" | sed -E 's/[^<:>]+:([^<:>]+).*/\1/g' | sed -E 's/\p{No}//g' | uniq
Just a small addition to Daniel's earlier command, to delete all superscripts before removing duplicates. Hopefully you don't need superscripts in your lemmas elsewhere. If you do then we can do other things here. *Note that I'm not able to reproduce this on my machine.* But I'm not able to reproduce Daniel's command either so that might just be something to do with my machine. I'm guessing it should work. Check it out and let me know. Tanmai On Thu, Apr 23, 2020 at 1:51 PM Per Tunedal <per.tune...@operamail.com> wrote: > Hi Kevin, > thanks for the explanation. Thus they are homonyms. How do I get rid of > the duplicates? > I just want: > > tur > > Yours, > Per Tunedal > > On Thu, Apr 23, 2020, at 10:00, Kevin Brubeck Unhammer wrote: > > "Per Tunedal" <per.tune...@operamail.com> > čálii: > > > Hi Daniel, > > Thank you! Works like a charm with a small exception. > > > > I get some strange duplicates like e.g. tur: > > > > tur¹ > > tur² > > slump vs färd, they have different paradigms: > > <e c="flaks" lm="tur"> <p><l>tur</l><r>tur¹</r></p><par > n="mjölk__n_ut"/></e> > <e c="gåtur" lm="tur"> <p><l>tur</l><r>tur²</r></p><par > n="film__n_ut"/></e> > > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > > *Attachments:* > > - signature.asc > > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff > -- *Khanna, Tanmai*
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff