Do the -p and -f option of org.apache.mahout.df.tools.Describe have to be HDFS URLs, can they be local file system paths?
On Fri, Jul 15, 2011 at 9:28 PM, Xiaobo Gu <[email protected]> wrote: > Can we make the file descriptor as following: > > 1. make a small csv file with the same format as the actual dataset, > say a CSV file with header and only one record, > 2. Use java weka.core.converters.CSVLoader filename.csv > > filename.arff to convert the small CSV into a ARFF file, see > http://maya.cs.depaul.edu/classes/ect584/weka/preprocess.html > 3. Use org.apache.mahout.df.tools.Describe to generate a descriptor > > > The only consern here is: does the small CSV file with one record > sufficient enough to generate the ARFF file header, or do we have to > use the whole file to avoid losing information? > > > Xiaobo Gu > > > > > On Fri, Jul 15, 2011 at 9:10 PM, Xiaobo Gu <[email protected]> wrote: >> But if we use CSV files, how can we generate descriptors for datasets? >> >> Cheers >> >> Xiaobo Gu >> >> On Thu, Jul 14, 2011 at 1:27 AM, deneche abdelhakim <[email protected]> >> wrote: >>> I guess yes. as long as you don't use quotes or double quotes to embed the >>> fields. >>> >>> On Wed, Jul 13, 2011 at 2:58 PM, Xiaobo Gu <[email protected]> wrote: >>> >>>> So for simple datasets, which only have numeric and character >>>> lable(without blank) category columns, can we just use CSV tools to >>>> save it as a standard CSV file without header? >>>> >>>> >>>> On Wed, Jul 13, 2011 at 3:53 AM, deneche abdelhakim <[email protected]> >>>> wrote: >>>> > the current implementation doesn't support the ARFF format >>>> out-of-the-box, >>>> > as described in the Wiki you need to remove the header of the file and >>>> leave >>>> > only the data. Actually, this implementation is fully compatible with >>>> UCI's >>>> > datasets which are comma separated text files. You'll also need to call >>>> the >>>> > dataset description tool (see the wiki) in order to generate a proper >>>> > description file (contains the nature of each attribute: Numerical or >>>> > Categorical). >>>> > >>>> > Yes you can use BuildForest and TestForest to generate and use Random >>>> forest >>>> > models from the command line >>>> > >>>> > On Tue, Jul 12, 2011 at 2:19 PM, Xiaobo Gu <[email protected]> >>>> wrote: >>>> > >>>> >> Hi, >>>> >> >>>> >> The Random Forest partial implementation in >>>> >> >>>> https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation >>>> >> use the ARFF file format, is ARFF the only supportted file format when >>>> >> using the BuildForest and TestForest program, and are BuildForest and >>>> >> TestForest program are official tools to build Random Forest models >>>> >> from the command line? >>>> >> >>>> >> Regards, >>>> >> >>>> >> Xiaobo Gu >>>> >> >>>> > >>>> >>> >> >
