A while back, Karl Wettin said[1]: > Coming from Weka's immensely bloated ARFF instances implementation, > I would like to see a really, really, really abstract solution. So if > possible I would prefere that collections was something introduced in > a layer further up. That way the consumer gets to choose what solution > is best at any given environment. JFC, raw data in a NIO-buffer, some > sort of stream, or what not.
I think this is a critical area for pre-coding design. I made a section on the main Wiki page for Design, and added under it a link to a new whiteboard page for discussing this topic: <http://cwiki.apache.org/confluence/display/MAHOUT/Collection(De-)Serialization> (So far, I've only put ARFF links there.) Karl, can you elaborate on what you think is wrong with Weka's instances implementation? Steve [1] <http://ml.grantingersoll.com/pipermail/ml-grantingersoll.com/2007-July/000034.html>
