ARFF file support for random forest classifiers

Marty Kube Thu, 28 Feb 2013 09:33:37 -0800

Hey,

I've been looking at consuming ARFF files for random forest classification.

If you look at the partial implementation example page one is asked todownload an ARFF file, edit the ARFF file to remove the meta-data, andthen recreate the same meta-data with command line arguments to theDescribe utility (plus a scan of the data to find enumerated values). Ithought it would be much nicer if we could just read the meta-data fromthe ARFF file.

I've been using the ARFF integration which generates a meta-data fileand sequence file with vectors from a ARFF file. The plan is to thenread the meta-data and sequence file in the RF classifiers.

So here is my question. The random forest classifiers use an binaryfile format for the metadata (generated byorg.apache.mahout.classifier.df.tools.Describe). The ARFF integrationwrites the meta-data in a different format. Is there a need to supportboth formats in the RF classifiers? I was thinking it might be best tomodify df.tools.Describe to generate/read the same format as the ARFFintegration (org.apache.mahout.utils.vectors.arff.Driver). Does thatsound like a reasonable plan?


Marty

ARFF file support for random forest classifiers

Reply via email to