Which is why it’s an import/export issue. On Apr 15, 2014, at 5:48 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
On Tue, Apr 15, 2014 at 10:58 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > As to the statement "There is not, nor do i think there will be a way to > run this stuff with CLI” seems unduly misleading. Really, does anyone > second this? > > There will be Scala scripts to drive this stuff and yes even from the CLI. > Do you imagine that every Mahout USER will be a Scala + Mahout DSL > programmer? That may be fine for commiters but users will be PHP devs, Ruby > devs, Python or Java devs maybe even a few C# devs. I think you are > confusing Mahout DEVS with USERS. Few users are R devs moving into > production work, they are production engineers moving into ML who want a > blackbox. They will need a language agnostic way to drive Mahout. Making > statements like this only confuse potential users and drive them away to no > purpose. I’m happy for the nascent Mahout-Scala shell, but it’s not in the > typical user’s world view. > Yes, ultimately there may need to be command line programs of various sorts, but the fact is, we need to make sure that we avoid files as the API for moving large amounts of data. That means that we have to have some way of controlling the persistence of in-memory objects and in many cases, that means that processing chains will not typically be integrated at the level of command line programs. Dmitriy's comment about R is apropos. You can put scripts together for various end-user purposes but you don't have a CLI for every R comment. Nor for every Perl, python or php command either. To the extent we have in-memory persistence across the life-time of multiple driver programs, then a sort of CLI interface will be possible. I know that h2o will do that, but I am not entirely clear on the life-time of RDD's in Spark relative to Mahout DSL programs. Regardless of possibility, I don't expect CLI interface to be the primary integration path for these new capabilities.