making this consistent would be very helpful.

On Thu, Feb 28, 2013 at 9:33 AM, Marty Kube <[email protected]>wrote:

> Hey,
>
> I've been looking at consuming ARFF files for random forest classification.
>
> If you look at the partial implementation example page one is asked to
> download an ARFF file, edit the ARFF file to remove the meta-data, and then
> recreate the same meta-data with command line arguments to the Describe
> utility (plus a scan of the data to find enumerated values).  I thought it
> would be much nicer if we could just read the meta-data from the ARFF file.
>
> I've been using the ARFF integration which generates a meta-data file and
> sequence file with vectors from a ARFF file.  The plan is to then read the
> meta-data and sequence file in the RF classifiers.
>
> So here is my question.  The random forest classifiers use an binary file
> format for the metadata (generated by 
> org.apache.mahout.classifier.**df.tools.Describe).
>   The ARFF integration writes the meta-data in a different format.  Is
> there a need to support both formats in the RF classifiers?  I was thinking
> it might be best to modify df.tools.Describe to generate/read the same
> format as the ARFF integration 
> (org.apache.mahout.utils.**vectors.arff.Driver).
>  Does that sound like a reasonable plan?
>
> Marty
>
>

Reply via email to