As an inexpert implementer, I would love to see a few datasets and
loose definitions of correctness for various operations. For example,
matrix decomposition. There are several algorithms and use cases, but
they all do the same thing: "derive a bunch of matrices and (perhaps)
vectors from a starter dataset. Multiply the derived matrices to form
the starter dataset". Except of course there is an epsilon. This test
would have several criteria for correctness, for example "the
Frobenius norms of the starter and output matrices will have a ratio
between 0.9 and 1.1".

These tests would give a great boost to Mahout's credibility as a
suite of finished tools.

On Fri, Dec 24, 2010 at 7:52 PM, Ted Dunning <[email protected]> wrote:
> I started this some months ago and have been maintaining an lud branch on my
> github mahout mirror.
>
> https://github.com/tdunning/LatentFactorLogLinear/tree/lud
>
> My guess is that a substantial rewrite is in order.  Keeping the framework
> is fine, but there is a lot of breaking of abstraction going on in that
> code.
>
> In the mean time, I did test QR decomposition.  This is usually a better
> choice than LUD in any case for linear solutions since you can handle least
> squares solutions so easily that way.
>
> On Fri, Dec 24, 2010 at 12:54 PM, Sebastian Schelter <[email protected]> wrote:
>
>> I'd be happy to see LUDecomposition be tested and undeprecated as I
>> don't trust my mathematical skills enough to do it myself.
>>
>> --sebastian
>>
>> Am 24.12.2010 21:37, schrieb Grant Ingersoll:
>> > I'd love to see some benchmarking done of the various algorithms, too,
>> just to add more to the wish list.
>> >
>> > On Dec 23, 2010, at 2:06 PM, Walter Gillett wrote:
>> >
>> >> I'm interested in learning about and contributing to Mahout and seems to
>> me that
>> >> creating some unit tests would be a good place to start. Looking at the
>> code
>> >> coverage stats, appears that there's lots of work to do, e.g., there
>> appears to
>> >> be no coverage for the Taste module. Should I just pick something
>> >> random/interesting to start with, or is there some high-priority part of
>> the
>> >> code that you would recommend to work on?
>> >>
>> >> Walter Gillett
>> >>
>> >>
>> >>
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com
>> >
>>
>>
>



-- 
Lance Norskog
[email protected]

Reply via email to