I recommend simply removing org.apache.mahout.common.nlp package, unless there is a long term plan for it. NGrams is the only class in the package and no one seems to know what the behavior of Map<String,List<String>> NGrams.generateNGrams() should be. Furthermore, no one seems to be using it. Even if someone is using it, the code is very small and could be incorporated into the non-mahout side of the project.
There is some (independently implemented) n-grams computation going on in* * org.apache.mahout.vectorizer.collocations.llr.CollocDriver* *but I don't think this is related to NLP. Otherwise it might make sense to try to merge the functionality (eventually). -Tim On Sat, Nov 3, 2012 at 12:25 PM, Sean Owen <[email protected]> wrote: > (I also don't see any usages.) > > > On Sat, Nov 3, 2012 at 5:08 PM, Timothy Mann <[email protected]> > wrote: > > > It looks like nothing in the core package is using > > org.apache.mahout.common.nlp.NGrams. Is anyone using this class? > > > > -Tim > > > > > > On Thu, Oct 25, 2012 at 10:22 PM, Timothy Mann <[email protected] > > >wrote: > > > > > I'm trying to write javadoc comments for > > > org.apache.mahout.common.nlp.NGrams. generateNGramsWithoutLabel() makes > > > sense, but I'm puzzled by the implementation of generateNGrams(). > > > > > > Map<String,List<String>> NGrams.generateNGrams() returns a Map from > > > 'labels' to a list of 'tokens' (where each token is an n-gram of words > > > separated by single spaces). In the current implementation only a > single > > > ('label', list of tokens) pair is put in the map. The 'label' is just > the > > > first word extracted from the specified text. I am guessing that the > > > returned Map is being used as a pair. What is the significance of the > > > 'label'? > > > > > > Thank you for your help. > > > > > > -Timothy Mann > > > > > >
