making this consistent would be very helpful. On Thu, Feb 28, 2013 at 9:33 AM, Marty Kube <[email protected]>wrote:
> Hey, > > I've been looking at consuming ARFF files for random forest classification. > > If you look at the partial implementation example page one is asked to > download an ARFF file, edit the ARFF file to remove the meta-data, and then > recreate the same meta-data with command line arguments to the Describe > utility (plus a scan of the data to find enumerated values). I thought it > would be much nicer if we could just read the meta-data from the ARFF file. > > I've been using the ARFF integration which generates a meta-data file and > sequence file with vectors from a ARFF file. The plan is to then read the > meta-data and sequence file in the RF classifiers. > > So here is my question. The random forest classifiers use an binary file > format for the metadata (generated by > org.apache.mahout.classifier.**df.tools.Describe). > The ARFF integration writes the meta-data in a different format. Is > there a need to support both formats in the RF classifiers? I was thinking > it might be best to modify df.tools.Describe to generate/read the same > format as the ARFF integration > (org.apache.mahout.utils.**vectors.arff.Driver). > Does that sound like a reasonable plan? > > Marty > >
