We are pleased to announce the release of version 1.13 of the Ngram Statistics Package. There are a number of potentially interesting changes in the package for this release which you may find useful. Please check it out and let us know if there are any questions or suggestions!
This version is available via CPAN or sourceforge via links you can find here : http://ngram.sourceforge.net The CHANGELOG appears below... Released March 5, 2010 all changes by TDP and YL * Replaced huge-count.pl with a more efficient version that counts large number of bigrams by creating multiple files, sorting, and merging them. The sorting and merging are carried out by huge-sort and huge-merge.pl. Note that the previous versions of huge-count.pl and associated utilities can be found in /Text-NSP/bin/utils/deprecated and will remain there for at least one more release. They will not however be installed automatically. (YL) * Added --uremove and --ufrequency options to count.pl. This allow for frequency cutoffs based on ngrams occuring more than a given number of times (rather than just less than, which is what --remove and --frequency enable). This is a long standing item on the NSP Todo list that has finally been checked off! (YL) * Introduced /bin/utils/contributed to allow for the distribution of user contributed programs that might be useful to other users. These programs do not get installed automatically with NSP, and are not included in our standard testing streams, but could still prove very useful to users. Please let us know if you have code you might like to include here. (TDP) * Added nsp-stoplist.regex to distribution (in /Text-NSP/bin/utils), to serve as a default stoplist. (TDP) Reported here : <L http://tech.groups.yahoo.com/group/ngram/message/280> This was not added in 1.11 due to failure to rebuilt MANIFEST. * Added support for 4-d log-likelihood (Text::NSP::Measures::4D::MI:ll). (TDP) Enjoy, Ted and Ying -- Ted Pedersen http://www.d.umn.edu/~tpederse