Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT) Page: File Format Integrations (https://cwiki.apache.org/confluence/display/MAHOUT/File+Format+Integrations)
Added by Lance Norskog: --------------------------------------------------------------------- There are several importers and exporters for common file formats: * *org.apache.mahout.utils.vectors.arff.Driver* imports [ARFF|http://www.cs.waikato.ac.nz/ml/weka/arff.html] into vectors * CSVVectorIterator imports CSV files into vectors. * MailProcessor parses text-only mailboxes into a SequenceFile with a numbered key and the text body in the value. These are low-level classes, not Hadoop file I/O classes. The Clustering package has a couple of exporters for standard formats: * *GraphMLClusterWriter* saves cluster data in the [GraphML|http://graphml.graphdrawing.org/] * *CSVClusterWriter* saves clusters in a csv-based format. Both of these formats are read by the [Gephi|http://gephi.org/] program, an interactive graph explorer. There are many file importers which are custom-made for particular algorithms: * The various Lucene vector creators Some programs exist to dump text versions of SequenceFiles for eyeballing: * ClusterDumper * ConfusionMatrixDumper * MatrixDumper * SequenceFileDumper Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action
