you don't need to convert the CSV file to ARFF, you can use it right away. you can use a small dataset as long as all values of categorical attributes are available in the dataset
On Fri, Jul 15, 2011 at 2:28 PM, Xiaobo Gu <[email protected]> wrote: > Can we make the file descriptor as following: > > 1. make a small csv file with the same format as the actual dataset, > say a CSV file with header and only one record, > 2. Use java weka.core.converters.CSVLoader filename.csv > > filename.arff to convert the small CSV into a ARFF file, see > http://maya.cs.depaul.edu/classes/ect584/weka/preprocess.html > 3. Use org.apache.mahout.df.tools.Describe to generate a descriptor > > > The only consern here is: does the small CSV file with one record > sufficient enough to generate the ARFF file header, or do we have to > use the whole file to avoid losing information? > > > Xiaobo Gu > > > > > On Fri, Jul 15, 2011 at 9:10 PM, Xiaobo Gu <[email protected]> wrote: > > But if we use CSV files, how can we generate descriptors for datasets? > > > > Cheers > > > > Xiaobo Gu > > > > On Thu, Jul 14, 2011 at 1:27 AM, deneche abdelhakim <[email protected]> > wrote: > >> I guess yes. as long as you don't use quotes or double quotes to embed > the > >> fields. > >> > >> On Wed, Jul 13, 2011 at 2:58 PM, Xiaobo Gu <[email protected]> > wrote: > >> > >>> So for simple datasets, which only have numeric and character > >>> lable(without blank) category columns, can we just use CSV tools to > >>> save it as a standard CSV file without header? > >>> > >>> > >>> On Wed, Jul 13, 2011 at 3:53 AM, deneche abdelhakim < > [email protected]> > >>> wrote: > >>> > the current implementation doesn't support the ARFF format > >>> out-of-the-box, > >>> > as described in the Wiki you need to remove the header of the file > and > >>> leave > >>> > only the data. Actually, this implementation is fully compatible with > >>> UCI's > >>> > datasets which are comma separated text files. You'll also need to > call > >>> the > >>> > dataset description tool (see the wiki) in order to generate a proper > >>> > description file (contains the nature of each attribute: Numerical or > >>> > Categorical). > >>> > > >>> > Yes you can use BuildForest and TestForest to generate and use Random > >>> forest > >>> > models from the command line > >>> > > >>> > On Tue, Jul 12, 2011 at 2:19 PM, Xiaobo Gu <[email protected]> > >>> wrote: > >>> > > >>> >> Hi, > >>> >> > >>> >> The Random Forest partial implementation in > >>> >> > >>> > https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation > >>> >> use the ARFF file format, is ARFF the only supportted file format > when > >>> >> using the BuildForest and TestForest program, and are BuildForest > and > >>> >> TestForest program are official tools to build Random Forest models > >>> >> from the command line? > >>> >> > >>> >> Regards, > >>> >> > >>> >> Xiaobo Gu > >>> >> > >>> > > >>> > >> > > >
