On Sat, Oct 3, 2009 at 12:30 PM, Ted Dunning <[email protected]> wrote:
> Labels are the only thing that scares me. It may be that we really need to > figure out a good answer to that in any case so that labels as an idea can > be separated from matrices. > > The real problem is that matrix operations should be by label rather than > index. If we can somehow make the indexes universal, then we should be OK. > One way to do that is to broaden the idea of conformability in matrices to > require that the were created using a common label dictionary for the > conformable indexes. > What do you mean by both "labels as an idea can be separated from matrices" and "matrix operations should be by label rather than by index"? These sound like contradictory statements to me - the latter means that matrices are inherently tied to labels. I worry about the performance of the current api if we encouraged people to always address values in a Vector via get(String label) (which seems to be what you're implying if we encourage always using labels not indices). What could be a method call and an array access (getQuick(index) ), is instead a method call, a HashMap get(String), another method call, a bounds-check, and then an array lookup. Maybe the JIT is smart enough to handle most of this, but I'd be surprised if there wasn't a difference here. > Another issue is that some matrices are essentially unbounded (or we do not > know the bounds). These matrices must by nature be sparse. This comes up > in situations such as a document x term matrix where we do not know how > many > terms there may be, nor how many documents. > I'm totally down with you on this one - the current setup where Matrix and Vector impls are required to know their final dimensionality at construction I certainly find pretty constraining: it requires that I make one full pass through my data just to measure how big everything is. Defining DomainException instead of CardinalityException, to be thrown when the label sets are different, would be a lot better, as long as we're only requiring, say, that you carry around the *name* of the label set, not the full set, if you are working at the lower level "by index only" apis. > > What are people's inclinations on this? > > > > Try an experiment? > What kind of experiment? There are a lot of ideas thrown around - relationship between labels and matrices, using CommonsMath underlying apis and implementations, separating Writable from Vector/Matrix, unbinding cardinalities from instantiation... -jake
