Pat Ferrel created MAHOUT-1568:
----------------------------------

             Summary: Build an I/O model that can replace sequence files for 
import/export
                 Key: MAHOUT-1568
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1568
             Project: Mahout
          Issue Type: New Feature
          Components: CLI
         Environment: Scala, Spark
            Reporter: Pat Ferrel
            Assignee: Pat Ferrel


Implement mechanisms to read and write data from/to flexible stores. These will 
support tuples streams and drms but with extensions that allow keeping user 
defined values for IDs. The mechanism in some sense can replace Sequence Files 
for import/export and will make the operation much easier for the user. In many 
cases directly consuming their input files.

Start with text delimited files for input/output in the Spark version of 
ItemSimilarity

A proposal is running with ItemSimilarity on Spark which and is documented on 
the github wiki here: https://github.com/pferrel/harness/wiki

Comments are appreciated



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to