I’ve got a very early version of a CLI running with delimited text file import 
using the Spark cooccurrence stuff as an example.  It’s a slightly rejiggered 
version of some stuff Sebastian did. The github project wiki has an explanation 
of what it does and how the code is structured. You can look at the code too 
but it’s certainly not ready for prime time, it doesn’t even write to files yet.

It does slightly touch on the dataframe issue since there is an object that is 
framed by user-specified IDs but none of the r-like behavior is there and may 
never be. For now it serves import/export requirements and can be replaced with 
some other dataframe when one is available if that make sense.

Any comments on the design are appreciated.
https://github.com/pferrel/harness
https://issues.apache.org/jira/browse/MAHOUT-1541

Reply via email to