I’ve got a very early version of a CLI running with delimited text file import using the Spark cooccurrence stuff as an example. It’s a slightly rejiggered version of some stuff Sebastian did. The github project wiki has an explanation of what it does and how the code is structured. You can look at the code too but it’s certainly not ready for prime time, it doesn’t even write to files yet.
It does slightly touch on the dataframe issue since there is an object that is framed by user-specified IDs but none of the r-like behavior is there and may never be. For now it serves import/export requirements and can be replaced with some other dataframe when one is available if that make sense. Any comments on the design are appreciated. https://github.com/pferrel/harness https://issues.apache.org/jira/browse/MAHOUT-1541
