I'd like to add solving ClassNotFoundException problems with third party jars in some jobs.
I experimented with having seq2sparse uploading a third party jar with analyzer and add it to the DistributedCache. Uploading works but didn't yet get it working inside the Mappers. I have some code lying around for this that can be used as a starting point, including a separate project that has dependencies on Mahout and on an analyzer to test things out. Another thing would be adding or improving the integration tools. For example adding a mysql2seq to cluster text from a SQL database. On Sat, Feb 11, 2012 at 8:01 PM, Jeff Eastman <j...@windwardsolutions.com> wrote: > Now that 0.6 is in the box, it seems a good time to start thinking about > 0.7, from a high level goal perspective at least. Here are a couple that > come to mind: > > Target code freeze date August 1, 2012 > Get Jenkins working for us again > Complete clustering refactoring and classification convergence What kind of clustering refactoring do mean here? I did some work on creating bean configurations in the past (MAHOUT-612). I underestimated the amount of work required to do the entire refactoring. If this can be contributed and committed on a per-job basis I would like to help out. > ...