Re: Questions about Vector

Sean Owen Mon, 15 Jun 2009 09:59:43 -0700

On Mon, Jun 15, 2009 at 4:17 PM, Jeff Eastman<j...@windwardsolutions.com> wrote:
> The difference is getQuick does not check bounds.


This does make sense for practical reasons. From an API perspective,
seems like it's slightly undesirable to have two of these methods, one
of which exists just to bypass the checks, as it might lead to silent
bugs. I see the purpose: performance. but in that case, as I am
getting to in MAHOUT-121, perhaps it is better to attack that problem
in a different way? for instance, if most of the setting is done in
vector construction, and I imagine it is, then we might want instead
some kind of bulk-set method.


> IIRC, clone of a SparseVector would share the map. Not desirable IMHO.

The default implementation of clone() produces a shallow copy, indeed,
but that is why it may be overridden. It seems slightly better to just
call this method clone() and implement Cloneable to fit into the
standard APIs.


> Cardinality is the actual dimension and size is the current number of
> elements. It only differs from cardinality in the case of SparseVector, in
> which case size returns the number of non-zero elements.

Got it. Is size() a useful value to expose? seems like an
implementation detail. What would a caller, in general, do with that?
You could say it can't hurt to expose, but for instance, the first use
of it that came up in my IDE appears to be a bug (could be wrong) in
UncommonsDistributions:

  public static Vector rDirichlet(Vector alpha) {
    Vector r = alpha.like();
    double total = alpha.zSum();
    double remainder = 1;
    for (int i = 0; i < r.size(); i++) {
      double a = alpha.get(i);
      total -= a;
      double beta = rBeta(a, Math.max(0, total));
      double p = beta * remainder;
      r.set(i, p);
      remainder -= p;
    }
    return r;
  }

Is cardinality intended?

Re: Questions about Vector

Reply via email to