[Apertium-stuff] Frequence list of paradigms

Per Tunedal Tue, 05 Feb 2013 04:19:57 -0800

Hi,
Strange, I cannot get the script working. It has worked in the past.

I get error messages on sed:


fgrep "<e lm=" apertium-sv-da.da.dix | sed "s/.*n=\"// s/\".*//" | sort
| uniq -c | sort -rn | grep "_n$" > paradigms-da-nouns.txt

sed: -e uttryck #1, tecken 7: oavslutat "s"-kommando

Yours,
Per Tunedal


On Mon, Sep 10, 2012, at 21:21, Jacob Nordfalk wrote:

  After finding the gender, and also the plural form, you may have to
  chose between dozens of paradigms (the same for Swedish, or any
  birth
  language).
  For that, I made a shell counting how often the different paradigms
  are
  used in a monodix :
  #!/bin/bash
  fgrep "<e lm=" $1 |
  sed "s/.*n=\"//
       s/\".*//" | sort | uniq -c | sort -rn
  $1 is the name of the monodix, and if you want to keep only
  paradigms
  for nouns, just add   | grep "_n$"   at the end.
  The script works, with word descriptions <e lm=...>....</e>
  on one line. If it is not the case for every word, change it.
  After that, you will see for every big categories of word (nouns,
  proper nouns, adjectives, verbs), really few paraddigms are used
  very often (about 3 main paradigms, and not far from every word of
  the category with 5 paradigms).
  So let examinate what is written in these main paradigms (gender +
  plural form). Your new words may have a good probability to mach
  one of these main paradigms. If not, go further on the list.
  In that case, let wach paradigms with a name like   root/suffix__n
  (there will be a  /  in the name). If you find a paradigm with a
  suffix matching the last letters of your noun (for singular form),
  it may be the good one.

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Frequence list of paradigms

Reply via email to