you don't need to convert the CSV file to ARFF, you can use it right away.

you can use a small dataset as long as all values of categorical attributes
are available in the dataset

On Fri, Jul 15, 2011 at 2:28 PM, Xiaobo Gu <[email protected]> wrote:

> Can we make the file descriptor as following:
>
> 1. make a small csv file with the same format as the actual dataset,
> say a CSV file with header and only one record,
> 2. Use java weka.core.converters.CSVLoader filename.csv >
> filename.arff  to convert the small CSV into a ARFF file, see
> http://maya.cs.depaul.edu/classes/ect584/weka/preprocess.html
> 3. Use org.apache.mahout.df.tools.Describe  to generate a descriptor
>
>
> The only consern here is: does the small CSV file with one record
> sufficient enough to generate the ARFF file header, or do we have to
> use the whole file to avoid losing information?
>
>
> Xiaobo Gu
>
>
>
>
> On Fri, Jul 15, 2011 at 9:10 PM, Xiaobo Gu <[email protected]> wrote:
> > But if we use CSV files, how can we generate descriptors for datasets?
> >
> > Cheers
> >
> > Xiaobo Gu
> >
> > On Thu, Jul 14, 2011 at 1:27 AM, deneche abdelhakim <[email protected]>
> wrote:
> >> I guess yes. as long as you don't use quotes or double quotes to embed
> the
> >> fields.
> >>
> >> On Wed, Jul 13, 2011 at 2:58 PM, Xiaobo Gu <[email protected]>
> wrote:
> >>
> >>> So for simple datasets, which only have numeric and character
> >>> lable(without blank) category columns,  can we just use CSV tools to
> >>> save it as a standard CSV file without header?
> >>>
> >>>
> >>> On Wed, Jul 13, 2011 at 3:53 AM, deneche abdelhakim <
> [email protected]>
> >>> wrote:
> >>> > the current implementation doesn't support the ARFF format
> >>> out-of-the-box,
> >>> > as described in the Wiki you need to remove the header of the file
> and
> >>> leave
> >>> > only the data. Actually, this implementation is fully compatible with
> >>> UCI's
> >>> > datasets which are comma separated text files. You'll also need to
> call
> >>> the
> >>> > dataset description tool (see the wiki) in order to generate a proper
> >>> > description file (contains the nature of each attribute: Numerical or
> >>> > Categorical).
> >>> >
> >>> > Yes you can use BuildForest and TestForest to generate and use Random
> >>> forest
> >>> > models from the command line
> >>> >
> >>> > On Tue, Jul 12, 2011 at 2:19 PM, Xiaobo Gu <[email protected]>
> >>> wrote:
> >>> >
> >>> >> Hi,
> >>> >>
> >>> >> The Random Forest partial implementation in
> >>> >>
> >>>
> https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation
> >>> >> use the ARFF file format, is ARFF the only supportted file format
> when
> >>> >> using the BuildForest and TestForest program, and are BuildForest
> and
> >>> >> TestForest program are official tools to build Random Forest models
> >>> >> from the command line?
> >>> >>
> >>> >> Regards,
> >>> >>
> >>> >> Xiaobo Gu
> >>> >>
> >>> >
> >>>
> >>
> >
>

Reply via email to