Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/hama/WordCountMatrix New page: [[TableOfContents(4)]] ---- == Abstract == Basically, It'll shows how to construct the matrix from the files. This word count matrix (document-word) approach is often referred to as latent semantic indexing and document clustering (Of course, A word frequently present in all documents will not be useful for clustering -- The length of all documents is not uniform so a lengthy document will have higher word counts).
