On Sat, Oct 3, 2009 at 12:30 PM, Ted Dunning <[email protected]> wrote:

> Labels are the only thing that scares me.  It may be that we really need to
> figure out a good answer to that in any case so that labels as an idea can
> be separated from matrices.
>
> The real problem is that matrix operations should be by label rather than
> index.  If we can somehow make the indexes universal, then we should be OK.
> One way to do that is to broaden the idea of conformability in matrices to
> require that the were created using a common label dictionary for the
> conformable indexes.
>

What do you mean by both "labels as an idea can be separated from matrices"
and "matrix operations should be by label rather than by index"?  These
sound
like contradictory statements to me - the latter means that matrices are
inherently
tied to labels.

I worry about the performance of the current api if we encouraged people to
always address values in a Vector via get(String label) (which seems to be
what
you're implying if we encourage always using labels not indices).   What
could
be a method call and an array access (getQuick(index) ), is instead a method
call,
a HashMap get(String), another method call, a bounds-check, and then an
array
lookup.  Maybe the JIT is smart enough to handle most of this, but I'd be
surprised
if there wasn't a difference here.


> Another issue is that some matrices are essentially unbounded (or we do not
> know the bounds).  These matrices must by nature be sparse.  This comes up
> in situations such as a document x term matrix where we do not know how
> many
> terms there may be, nor how many documents.
>

I'm totally down with you on this one - the current setup where Matrix and
Vector
impls are required to know their final dimensionality at construction I
certainly
find pretty constraining: it requires that I make one full pass through my
data
just to measure how big everything is.

Defining DomainException instead of CardinalityException, to be thrown when
the label sets are different, would be a lot better, as long as we're only
requiring,
say, that you carry around the *name* of the label set, not the full set, if
you
are working at the lower level "by index only" apis.


> > What are people's inclinations on this?
> >
>
> Try an experiment?
>

What kind of experiment?  There are a lot of ideas thrown around -
relationship
between labels and matrices, using CommonsMath underlying apis and
implementations, separating Writable from Vector/Matrix, unbinding
cardinalities
from instantiation...

  -jake

Reply via email to