I believe Ken Lettow has a Porter stemmer (extracts roots of words to reduce the noise of variable suffixes: http://tartarus.org/martin/PorterStemmer/), which would probably be an early step in any attempt at this sort of thing.
On Wed, Sep 9, 2015 at 5:11 PM, Wendell P <[email protected]> wrote: > Looking through the J list archives, I don't see any interest in this > line of work. Are there any active projects in information retrieval or > computational semantics using J? Can you think of reasons why it might > not be a good choice? > > Development is usually in C++ or Java, but I've been thinking that J or > K/Q might be a good choice, allowing greater programmer productivity > without sacrificing performance. > > The vector space model[1] underlies much work in information > retrieval[2] and computational semantics[3]. Two good surveys of the > application of matrix methods in this area are Berry[4] and Elden[5] > > [1] https://en.wikipedia.org/wiki/Vector_space_model > [2] e.g. https://en.wikipedia.org/wiki/Lucene > [3] e.g. https://code.google.com/p/word2vec/ > [4] Understanding Search Engines: Mathematical Modeling and Text > Retrieval 2/e (2005) > [5] Matrix Methods in Data Mining and Pattern Recognition (2007) > > -- > http://www.fastmail.com - A no graphics, no pop-ups email service > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Devon McCormick, CFA ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
