Re: New Instances

Albert Bifet Tue, 13 Jan 2015 17:37:26 -0800

Thanks Gianmarco,

1/ Range contains the information of which are the input and output
attributes.  Each instance has an InstancesHeader field that contains an
AttributesInformation object.


2/ In the case that there is no metadata information, then all attributes
are numeric, right? This seems reasonable.

- InstancesHeader contains an InstanceInformation object. We may use
InstanceInformation instead of InstancesHeader.

- Yes, AttributesInformation can be modified at runtime, adding attributes
and values of attributes.

Cheers,

Albert

On Tue, Jan 13, 2015 at 9:18 PM, Gianmarco De Francisci Morales <
[email protected]> wrote:

> Thanks Albert.
>
> I have a couple of questions.
>
> 1/ how do we distinguish between input and output attributes?
> In particular, let's take as an example the default single-label
> classification.
> I guess that is the role of Range.
> However, do we have to serialize it with every instance we send?
>
> 2/ to distinguish between numeric and categorical we need some metadata,
> which I guess goes into InstancesHeader.
> I am fine with keeping it also for compatibility with MOA, and we might use
> it if we have access to it.
> However, I would prefer algorithms not to rely on it, and consider the
> presence of metadata optional.
>
> Some other points:
> - what's the difference between InstanceInformation and InstancesHeaders
> - can the AttributesInformation be modified at runtime? Or is it statically
> set for the whole duration of the algorithm?
>
> Cheers,
>
> --
> Gianmarco
>
> On 10 January 2015 at 04:26, Albert Bifet <[email protected]> wrote:
>
> > Hi all,
> >
> > This is a short explanation of the new instances of SAMOA.
> >
> >
> >
> https://github.com/abifet/moa/tree/master/moa/src/main/java/com/yahoo/labs/samoa/instances
> >
> > Instances will be much simpler than the current implementation. They
> > can be dense or sparse, and they contain only one array (or two for
> > sparse) with all the attribute values. In the current implementation
> > we have two arrays, one for input values and another for output values
> >
> > The main changes are two:
> >
> > 1/ All instances are going to be multi-label, that means they have
> > input and output attributes, and we can call their values with
> > getInputValue(i) and getOutputValue(i).
> >
> > 2/ Attributes are numeric by default, so we only keep information of
> > discrete attributes (values). For example if we have one million
> > numeric attributes, we will not need to store attribute information of
> > these one million numeric attributes.
> >
> > Basically, we have:
> >
> > - Instance: interface
> > - MultiLabelInstance: interface (empty interface that extends Instance)
> > - InstanceImpl extends MultiLabelInstance: implementation of Instance.
> > Contains
> >     - InstanceData
> >     - InstancesHeader
> > - DenseInstance extends InstanceImpl
> > - SparseInstance extends InstanceImpl
> >
> > -Instances: a list of instances and an InstanceInformation object
> > -InstancesHeader extends Instances
> >
> > -InstanceData: interface
> > -DenseInstanceData implements InstanceData
> > -SparseInstanceData implements InstanceData
> >
> > - InstanceInformation contains name, attribute information and
> > attributes to predict.
> > - AttributesInformation contains two list of Attributes (indices and
> > values) for non-numerical attributes. Numerical attributes are by
> > default
> > - Range: attributes to predict
> >
> > Cheers,
> >
> > Albert
> >
>

Re: New Instances

Reply via email to