Hi
I need to do text clustering but in the context of natural
language processing. Consequently, word ordering becomes important.
Initially, I will be doing the nGram model (with n =3).In Mahout, the Vector and SequenceFileFormat representation does not take into consideration contextual information (as I understand). I know I might need to modify both of them but is there a bagofwords and stoplist that I may use? Thanks, Neel Sheyal
