[Apertium-stuff] Frequence list of non-translated words

Per Tunedal Sun, 11 Nov 2012 13:23:46 -0800

Hi,
I have tried translating some texts and got the translation in a large 
text file with all the error codes. I would like a frequency list for
the words that get a certain error.


Example:

Odenses *infrastrukur är präglad av @beliggenhed vid Odense Kanal, som
*forbinder Odense Hamn med Odense Fjord. Den blev byggd i @åre omkring
år 1800 och ger entré från vattnet till stadens centrum. *Herudover har
den #ha betydelse for @infrastruktur vid placeringen av
*kraftvarmeværket *Fynsværket och den tidigare *losseplads på Stege Ö.

I looked at the page:

http://wiki.apertium.org/wiki/One-liners

and found the scripts:

     Get unknown words from chunked text and sort by frequency: 

sed 's/\$\W*\^/$\n^/g' | grep '@' | sed 's/><.*/>$/g' |  sort -f | uniq
-ci  | sort -gr

tr " " "\n" | grep "@" | tr -d "[:punct:]" | sort | uniq -c | sort -r

But, unfortunately I cannot understand how to use them. How to enter the
input and output file?

BTW What's the scripting language?

Yours,
Per Tunedal

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

[Apertium-stuff] Frequence list of non-translated words

Reply via email to