Thanks Gianmarco, 1/ Range contains the information of which are the input and output attributes. Each instance has an InstancesHeader field that contains an AttributesInformation object.
2/ In the case that there is no metadata information, then all attributes are numeric, right? This seems reasonable. - InstancesHeader contains an InstanceInformation object. We may use InstanceInformation instead of InstancesHeader. - Yes, AttributesInformation can be modified at runtime, adding attributes and values of attributes. Cheers, Albert On Tue, Jan 13, 2015 at 9:18 PM, Gianmarco De Francisci Morales < [email protected]> wrote: > Thanks Albert. > > I have a couple of questions. > > 1/ how do we distinguish between input and output attributes? > In particular, let's take as an example the default single-label > classification. > I guess that is the role of Range. > However, do we have to serialize it with every instance we send? > > 2/ to distinguish between numeric and categorical we need some metadata, > which I guess goes into InstancesHeader. > I am fine with keeping it also for compatibility with MOA, and we might use > it if we have access to it. > However, I would prefer algorithms not to rely on it, and consider the > presence of metadata optional. > > Some other points: > - what's the difference between InstanceInformation and InstancesHeaders > - can the AttributesInformation be modified at runtime? Or is it statically > set for the whole duration of the algorithm? > > Cheers, > > -- > Gianmarco > > On 10 January 2015 at 04:26, Albert Bifet <[email protected]> wrote: > > > Hi all, > > > > This is a short explanation of the new instances of SAMOA. > > > > > > > https://github.com/abifet/moa/tree/master/moa/src/main/java/com/yahoo/labs/samoa/instances > > > > Instances will be much simpler than the current implementation. They > > can be dense or sparse, and they contain only one array (or two for > > sparse) with all the attribute values. In the current implementation > > we have two arrays, one for input values and another for output values > > > > The main changes are two: > > > > 1/ All instances are going to be multi-label, that means they have > > input and output attributes, and we can call their values with > > getInputValue(i) and getOutputValue(i). > > > > 2/ Attributes are numeric by default, so we only keep information of > > discrete attributes (values). For example if we have one million > > numeric attributes, we will not need to store attribute information of > > these one million numeric attributes. > > > > Basically, we have: > > > > - Instance: interface > > - MultiLabelInstance: interface (empty interface that extends Instance) > > - InstanceImpl extends MultiLabelInstance: implementation of Instance. > > Contains > > - InstanceData > > - InstancesHeader > > - DenseInstance extends InstanceImpl > > - SparseInstance extends InstanceImpl > > > > -Instances: a list of instances and an InstanceInformation object > > -InstancesHeader extends Instances > > > > -InstanceData: interface > > -DenseInstanceData implements InstanceData > > -SparseInstanceData implements InstanceData > > > > - InstanceInformation contains name, attribute information and > > attributes to predict. > > - AttributesInformation contains two list of Attributes (indices and > > values) for non-numerical attributes. Numerical attributes are by > > default > > - Range: attributes to predict > > > > Cheers, > > > > Albert > > >
