Hi again, I've found that the solution you suggested doesn't work properly. Some non-existent words are produced in the process and are kept throughout the filtering. This got worse when I tried to get adjectives. The list was full of strange words, as well as words of other kinds, like e.g. verbs.
I suspected the expansion produces some output that pollutes the result. Thus I tried working directly on apertium-swe-swe.dix, like this: grep "lm=" apertium-swe.swe.dix | grep "__n_" | less This produced a usable list of nouns. A side effect is that this is far faster. Remember, I asked about some very strange Swedish "nouns": arna arnas arnas- ars ars- I have so far not been able to find out where they come from. They are not listed as nouns in apertium-swe.swe.dix Among the adjectives I got e.g. the following verbs: abbreviera abdikera abonnera abortera I used: lt-expand apertium-swe.swe.dix | grep -E "[^<:>]+:[^<:>]+<adj>" | sed -E 's/[^<:>]+:([^<:>]+).*/\1/g' | sed 's/[¹²³]//g' Any one who has a clue? Yours, Per Tunedal On Tue, Apr 28, 2020, at 18:36, Samuel Sloniker wrote: > egrep and fgrep are deprecated. Use grep -E and grep -F . > > On Tue, Apr 28, 2020 at 7:56 AM Per Tunedal <per.tune...@operamail.com> wrote: >> Hi, >> thank you all for your kind help. I'm getting the lists I need. >> Yours >> Per Tunedal >> >> On Mon, Apr 27, 2020, at 20:35, Bernard Chardonneau wrote: >> > Yes, me I rather do that instead of >> > >> > (<vblex>|<vbmod>|<vbser>|<vbhaver>) >> > >> > and I also use fgrep and egrep instead of grep -F and grep -E >> > as it was/(is ?) in UNIX. >> > >> > >> > > Date: Sun, 26 Apr 2020 10:40:39 -0700 >> > > From: Samuel Sloniker <scoopgra...@gmail.com> >> > > To: apertium-stuff@lists.sourceforge.net >> > > Reply-To: apertium-stuff@lists.sourceforge.net >> > > Subject: Re: [Apertium-stuff] List of verbs >> > > Pièce(s) jointes(s) probable(s)> >> > > >> > > Shouldn't <vb(lex|mod|ser|haver)> also work? >> > > >> > > On Fri, Apr 24, 2020 at 7:25 AM Daniel Swanson >> <awesomeevildu...@gmail.com> >> > > wrote: >> > > >> > > > Also, to explain the patterns >> > > > >> > > > [^<:>]+ is "match any string of characters that doesn't contain a tag >> or a >> > > > colon" >> > > > >> > > > So the grep is "anything without tags or colons (i.e. a surface form) >> then >> > > > a colon then another string (a lemma) then a <n> tag" >> > > > >> > > > The sed matches roughly the same thing except it has () around the >> lemma >> > > > so it can refer to it later and .* to match whatever tags there may >> be. \1 >> > > > then replaces the line with the contents of the first (), i.e. the >> lemma. >> > > > >> > >> > -------------------------------- >> > Bernard Chardonneau (France) >> > Phone : [33] 9 72 36 32 90 >> > GSM phone : [33] 7 69 46 16 31 >> > >> > An alternative Apertium translation website : >> > http://apertiumtrad.tuxfamily.org >> > >> > Multilingual websites for my free softwares : >> > http://libremail.free.fr and http://libremail.tuxfamily.org >> > http://cyloop.tuxfamily.org (mainly translated with Apertium) >> > >> > My general website (in french only) >> > http://bech.free.fr >> > >> > >> > _______________________________________________ >> > Apertium-stuff mailing list >> > Apertium-stuff@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > >> >> >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff > > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff