Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: File Format Integrations 
(https://cwiki.apache.org/confluence/display/MAHOUT/File+Format+Integrations)

Added by Lance Norskog:
---------------------------------------------------------------------
There are several importers and exporters for common file formats:
* *org.apache.mahout.utils.vectors.arff.Driver* imports 
[ARFF|http://www.cs.waikato.ac.nz/ml/weka/arff.html] into vectors
* CSVVectorIterator imports CSV files into vectors. 
* MailProcessor parses text-only mailboxes into a SequenceFile with a numbered 
key and the text body in the value.

These are low-level classes, not Hadoop file I/O classes.

The Clustering package has a couple of exporters for standard formats:
* *GraphMLClusterWriter* saves cluster data in the 
[GraphML|http://graphml.graphdrawing.org/]
* *CSVClusterWriter* saves clusters in a csv-based format.

Both of these formats are read by the [Gephi|http://gephi.org/] program, an 
interactive graph explorer. 

There are many file importers which are custom-made for particular algorithms:
* The various Lucene vector creators

Some programs exist to dump text versions of SequenceFiles for eyeballing:
* ClusterDumper
* ConfusionMatrixDumper
* MatrixDumper
* SequenceFileDumper

Change your notification preferences: 
https://cwiki.apache.org/confluence/users/viewnotifications.action

Reply via email to