Hi Yuan, Bayes classifier takes only binary features. So inorder to make your User class into a dataset,You need to create a tab separated file with label as the key and space separated features as the value. Presence of a feature makes it true absence makes it false.
e.g. if you are classifying heart-attack prone v/s healthy individual(assuming from your data) take two labels heart-attack and healthy You will need to convert integer and double values and map them to boolean features say you have boolean features like Weight:40-50 Weight:50-60 Age:20-30 Age:30-40 For user A with age = 23 weight = 53 diabetes=false write the line healthy<TAB>Age:20-30 Weight:50-60 For user B with age = 37 weight = 52 diabetes=true heart-attack<TAB>Age:30-40 Weight:50-60 diabetes You will have many such lines for each feature in your dataset file. Give the file path to the classifier and it learns the model for you. For now, the algorithm takes the data from a file and not from a memory datastructure and do not use vectors. Try the classification example(20newsgroups) to get an idea of how the classifier can be run Robin On Wed, Jan 27, 2010 at 8:56 AM, Yuan Wang <[email protected]> wrote: > Hi all, > > I am learning Mahout. It seems to me most the examples load dataset from > files using command line. I know Baynes classifier can work with HBase. > > Is there any way to build the dataset from scratch in Java Code? > > for example, there is a User class having four attributes: ID(data type is > long or String), age {int}, weight (double), and diabetes {boolean} . > There are 100 user objects in my memory, is there way I can convert them > into any type of dataset that classifier algorithm can handle. > > I noticed there are vector class and InMemoryDataStore, but I don't how to > use them. If someone can give any hint or write down some pseudo code, that > would very helpful. > > Thanks, > Yuan >
