Nice and simple API! Some things to comment: - how can we manage discrete attributes, for example attribute class: "+","-"?
- In sparse instances, is the performance of a map similar to the performance of two arrays, one for indices and one for values? Albert On Sat, Jan 24, 2015 at 1:38 AM, Matthieu Morel <[email protected]> wrote: > I took a shot at drafting a simplified API for instances. > https://github.com/matthieumorel/samoa/tree/new-instances > > As pointed out in this thread, the current API is too exhaustive, too > tied to a specific implementation, and too tied to WEKA/MOA APIs. > > In addition, I feel the header/information does not belong to the > instance. This is something which is used when parsing arff files > where static information about the stream is available from the start. > But for a real streaming use case, we should not make such assumption. > Whatever is known at the begining should be loaded by the topology, > but not included in the instances. Arff files can still be loaded and > generate instances in the new format. Only the headers should be > parsed separately. > > This proposal is a draft and single label only. It should be easy to > add functionality suggested by Albert for multi labels. > > Feel free to comment! > > Matthieu > > > > > On Wed, Jan 21, 2015 at 2:31 AM, Albert Bifet <[email protected]> > wrote: > > 1/ Learners as decision trees can deal with new instances that arrive > > with more label classes. New instances can arrive with new headers. > > > > 2/ To change class labels dynamically, we need to add a method > > "setValue(int, string)" in the Attribute class to add dynamically new > > attribute values. > > > > 3/ The current design is being compatible with the methods in weka > > instances. It could be nice to have a fresher design. I will need some > > help to have a simplified and fresher design of the instances as I'm a > > bit conditioned by the previous instance usage :) > > > > Thanks, > > > > Albert > > > > > > > > On Wed, Jan 21, 2015 at 2:33 AM, Olivier Van Laere > > <[email protected]> wrote: > >> Hey Matthieu, > >> > >>> On Jan 20, 2015, at 1:47 AM, Matthieu Morel <[email protected]> > wrote: > >>> > >>> I'm confused. From what I see the number of classes is currently fixed > >>> in the instance header. As if it was static. I suppose you can work > >>> around that limitation with some hacks but I want to use a clean API > >>> for that. > >>> > >>> Or is there a recommended way I'm missing? > >> > >> Ah, I think I remember now what happened. As far as I encountered this > situation, the data had say an .arff format with a header stating the > number of class values, and the instance header was read from that, while > the instances were then read by the line. > >> > >> I worked around that by just storing the class label seen in the > instances on the fly when building a model, and ignored that field of the > instance header. Sorry for the confusion! > >> > >> Cheers, > >> Olivier > >> > >> >
