Carsten Haitzler (The Rasterman) wrote: > it does have concept of frequency orf words. i just dont have any DATA for > that. the dict format handles is: > word1 > word2 > word3 > > OR > word1 20 > word2 434 > word3 1
I was thinking to a way to automatize this a while ago, but I wrote something just now... The basic idea is that of using the google number of results for each word and using this value as a frequency number (well, I know these numbers are often too much great, so I guess that they should be re-analyzed and lowered but I had no time to do this now :P). So this is a little utility I wrote [1] to check the frequency of each word and writing back a new dictionary with frequency data. To run it you need php-cli (I guess v5 or above), set the given options, do "php words-popularity.php" and wait the work to be finished! :P It could be a long work, but it should give good results. PS: I've used php since I run it both on my PC and on a server (dividing the work) where I've ssh access but in which I can run by command line just a little subset of languages, and php is one of this. [1] http://3v1n0.tuxfamily.org/openmoko/words-popularity.phps -- Treviño's World - Life and Linux http://www.3v1n0.net/ _______________________________________________ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community