I had a chance to get feedback last night from a few Old Street
startups using Mahout. The overall comments were of course positive --
it provides a solution that's at least 80% ready-to-go and saves a
great deal of trial and error in getting towards something working.

The problems I heard were similar to last time. The jobs are uneven
and not standard, so each has its own peculiar learning curve. There
are evidently still a number of invisible assumptions baked into the
code about the file structure and environment too -- I heard again
that repeated use of "new Configuration()" around the code breaks
things. The experience of Mahout seemed to be one of weeks of trial
and error, some of which has to do with understanding the machinery of
Hadoop of course. Finally there was a group using the LDA
implementation but had abandoned it over scalability concerns --
didn't get more detail on that.

I do reiterate that there is, at heart, a significant and eager
developer audience who is finding all this really useful, that are
burning up a lot of energy just getting started. That's just the
nature of this beast at version 0.x, but, I think it just once again
underscores that the need is not for new algorithms, but cleaning up,
fixing, documenting, streamlining what's already there.

Reply via email to