El dg 11 de 11 de 2012 a les 22:23 +0100, en/na Per Tunedal va escriure: > Hi, > I have tried translating some texts and got the translation in a large > text file with all the error codes. I would like a frequency list for > the words that get a certain error. > > Example: > > Odenses *infrastrukur är präglad av @beliggenhed vid Odense Kanal, som > *forbinder Odense Hamn med Odense Fjord. Den blev byggd i @åre omkring > år 1800 och ger entré från vattnet till stadens centrum. *Herudover har > den #ha betydelse for @infrastruktur vid placeringen av > *kraftvarmeværket *Fynsværket och den tidigare *losseplads på Stege Ö. > > I looked at the page: > > http://wiki.apertium.org/wiki/One-liners > > and found the scripts: > > Get unknown words from chunked text and sort by frequency: > > sed 's/\$\W*\^/$\n^/g' | grep '@' | sed 's/><.*/>$/g' | sort -f | uniq > -ci | sort -gr > > tr " " "\n" | grep "@" | tr -d "[:punct:]" | sort | uniq -c | sort -r > > But, unfortunately I cannot understand how to use them. How to enter the > input and output file?
Try this: $ cat ~/corpora/north_germanic_bibles/bible.da/book001.chapter001.txt | apertium -d . da-sv-biltrans | sed 's/\$\W*\^/$\n^/g' | grep '@' | sort -f | uniq -ci | sort -gr Where the file after 'cat' is the corpus you want to use. > BTW What's the scripting language? That's bash. Fran ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
