But if we use CSV files, how can we generate descriptors for datasets? Cheers
Xiaobo Gu On Thu, Jul 14, 2011 at 1:27 AM, deneche abdelhakim <[email protected]> wrote: > I guess yes. as long as you don't use quotes or double quotes to embed the > fields. > > On Wed, Jul 13, 2011 at 2:58 PM, Xiaobo Gu <[email protected]> wrote: > >> So for simple datasets, which only have numeric and character >> lable(without blank) category columns, can we just use CSV tools to >> save it as a standard CSV file without header? >> >> >> On Wed, Jul 13, 2011 at 3:53 AM, deneche abdelhakim <[email protected]> >> wrote: >> > the current implementation doesn't support the ARFF format >> out-of-the-box, >> > as described in the Wiki you need to remove the header of the file and >> leave >> > only the data. Actually, this implementation is fully compatible with >> UCI's >> > datasets which are comma separated text files. You'll also need to call >> the >> > dataset description tool (see the wiki) in order to generate a proper >> > description file (contains the nature of each attribute: Numerical or >> > Categorical). >> > >> > Yes you can use BuildForest and TestForest to generate and use Random >> forest >> > models from the command line >> > >> > On Tue, Jul 12, 2011 at 2:19 PM, Xiaobo Gu <[email protected]> >> wrote: >> > >> >> Hi, >> >> >> >> The Random Forest partial implementation in >> >> >> https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation >> >> use the ARFF file format, is ARFF the only supportted file format when >> >> using the BuildForest and TestForest program, and are BuildForest and >> >> TestForest program are official tools to build Random Forest models >> >> from the command line? >> >> >> >> Regards, >> >> >> >> Xiaobo Gu >> >> >> > >> >
