Hi,
Strange, I cannot get the script working. It has worked in the past.
I get error messages on sed:
fgrep "<e lm=" apertium-sv-da.da.dix | sed "s/.*n=\"// s/\".*//" | sort
| uniq -c | sort -rn | grep "_n$" > paradigms-da-nouns.txt
sed: -e uttryck #1, tecken 7: oavslutat "s"-kommando
Yours,
Per Tunedal
On Mon, Sep 10, 2012, at 21:21, Jacob Nordfalk wrote:
After finding the gender, and also the plural form, you may have to
chose between dozens of paradigms (the same for Swedish, or any
birth
language).
For that, I made a shell counting how often the different paradigms
are
used in a monodix :
#!/bin/bash
fgrep "<e lm=" $1 |
sed "s/.*n=\"//
s/\".*//" | sort | uniq -c | sort -rn
$1 is the name of the monodix, and if you want to keep only
paradigms
for nouns, just add | grep "_n$" at the end.
The script works, with word descriptions <e lm=...>....</e>
on one line. If it is not the case for every word, change it.
After that, you will see for every big categories of word (nouns,
proper nouns, adjectives, verbs), really few paraddigms are used
very often (about 3 main paradigms, and not far from every word of
the category with 5 paradigms).
So let examinate what is written in these main paradigms (gender +
plural form). Your new words may have a good probability to mach
one of these main paradigms. If not, go further on the list.
In that case, let wach paradigms with a name like root/suffix__n
(there will be a / in the name). If you find a paradigm with a
suffix matching the last letters of your noun (for singular form),
it may be the good one.
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff