[ 
https://issues.apache.org/jira/browse/MATH-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772289#action_12772289
 ] 

Jake Mannix commented on MATH-313:
----------------------------------

Regarding practicality of these abstractions, we could limit the scope of 
"generalized real-valued functions" to instead a set of static building blocks:

{code}
public class Functions {
  
  public abstract class ComposableFunction implements UnivariateRealFunction {
  // has all the methods we described implemented, or maybe shortens 
"preCompose" to "of" - so you can read it "f of g..." 
  // leaves this abstract:
    abstract double value(double d);
  }

  public static ComposableFunction Exp = new ComposableFunction() { double 
value(double d) { return Math.exp(d); } }
  public static ComposableFunction Sinh = ...
  
  // lots of java.lang.Math functions here, with object-oriented ways to 
combine them

  public abstract class BinaryFunction {
    abstract double value(double d1, double d2);
    ComposableFunction fix2ndArg(double secondArg) { /*impl*/ }
    ComposableFunction fix1stArg(double firstArg) { /* impl */ }
  }

  public static BinaryFunction Pow = new BinaryFunction() { double value(double 
d1, double d2) { return Math.pow(d1, d2); } }
  public static BinaryFunction Log = new BinaryFunction() { double value(double 
d1, double d2) { return Math.log(d1, d2); } }
  public static BinaryFunction Max = new BinaryFunction() ...
  public static BinaryFunction Min = ... 
}
{code}

This contains the abstraction within one holder class which has a bunch of 
functional building blocks which are easy to use, and doing things like
{code}
  RealVector w = v.map(Exp.of(Negate.of(Pow.fix2ndArg(2))));
{code}
for when you want to map to a gaussian of your vector.

The use for this kind of thing is pretty varied, but in general allows for some 
really easy to read and concise stuff, when combined with the Collector 
paradigm, imagining you have this interface (with the extra collect methods, 
instead of just one, because for collecting on Vectors, you might imagine the 
Collector doing something different at different index values - for example, a 
weighted euclidean dot product, and similarly for matrices):

{code}
public interface UnivariateCollector {
  void collect(double d);
  void collect(int i, double d);
  void collect(int i, int j, double d);
  double result();
}
{code}

This is the interface which gets given to collections of doubles ( like, say, 
RealVector, and possibly RealMatrix, which already has a visitor, but it's a 
mutating visitor ), which has the following method and implementation:

{code}
public interface DoubleCollection {
  Iterator<DoubleEntry> iterator();
  Iterator<DoubleEntry> sparseIterator();
  double collect(UnivariateCollector collector);
}
{code}

Note I'm not specifically saying this particular interface should exist in this 
level of generality, but imagine that these methods are available on 
AbstractRealVector, at least:

{code}
public abstract class AbstractRealVector implements RealVector, 
DoubleCollection {

  // leave iterator() and sparseIterator() abstract

  public double collect(UnivariateCollector collector) {
    Iterator<DoubleEntry> it  = // use some logic to decide whether to take 
sparse or dense iterator
    DoubleEntry e;
    while(it.hasNext() && (e = it.next()) != null) {
      collector.collect(entry.index(), entry.value());
    }
    return collector.result();
  }

// useful for generalized dot products, kernels, distances and angles:
  public double collect(BivariateCollector collector, RealVector v) {
    // use some logic based on whether this or v is instanceof SparseVector, to 
decide how to iterate both of them, then
    some loop {
      collector.collect(index, thisVectorAtIndex, vAtIndex);
    }
    return collector.result();
  }

  public double normL1() { return collect(Abs.asCollector()); }
  public double normLInf() { return collect(Abs.asCollector(Max)); }
  
 // and in general:
  public double normLp(final double p) { 
Math.pow(collect(Pow.fix2ndArg(p)).asCollector()), 1/p); }

  public double dot(RealVector v) { return collect(Times.asCollector(), v); }

  public RealVector subtract(RealVector v) { return map(Subtract, v); }

  public RealVector ebeMultiply(RealVector v) { return mapToSelf(Multiply, v); 
} 
  // ditto for all the other ebeXXX methods

  public double distance(RealVector v) {
    return collect(new AbstractBivariateCollector() {
      public void collect(int index, double d1, double d2) { result += 
Math.pow(d1-d2, 2); }
    }
  }
  
  // similarly for L1Distance, LInfDistance and in general any Lp distance, and 
in fact, since Collector knows
  // what index you're on when collecting, it easily deals with weighted 
distances, and projected onto missing 
  // dimension subspaces in particular
{code}

The reason I bring up these kinds of things is that in Machine Learning, in 
general, you often want to do fairly arbitrary manipulations on vectors, and 
you also may want to do arbitrary combinations of them.  I'm primarily 
interested in vectors and functions from vectors to reals (note: 
MultivariateRealFunction currently only takes double[] arguments, not 
RealVector - how to deal with the sparse case, ack!), and vectors to vectors 
and reals to reals, the the fairly generic sense, and having to write a ton of 
boilerplate every time I want to compose a function, or write a generalized dot 
product.  If I can't pass in a function to my vector, I need a method in 
another class, which doesn't have access to the internals of the vector, which 
is usually fine, but in general: Vectors should know how to compute their 
generalized distances, lengths, angles, differences, inner products, etc - 
given a little guidance on what the specific kind of generalized method they 
need to use.  

Of course, yes, this can be done fully outside of the linear package: once we 
have at the very least access to dense + sparse iterators on RealVector, we can 
write a whole framework outside of linear which has DotProduct (defining double 
dot(RealVector v1, RealVector v2) ), Distance, KernelizedNorm, etc. This can be 
done, but doing it this way is not my preference, and dulls my desire to try 
and help get Commons-Math as the linear library to be used with Mahout and 
Decomposer.

> Functions could be more object-oriented without losing any power.
> -----------------------------------------------------------------
>
>                 Key: MATH-313
>                 URL: https://issues.apache.org/jira/browse/MATH-313
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 2.0
>         Environment: all
>            Reporter: Jake Mannix
>             Fix For: 2.1
>
>
> UnivariateRealFunction, for example, is a map from R to R.  The set of such 
> functions has tons and tons of structure: in addition to being an algebra, 
> equipped with +,-,*, and scaling by constants, it maps the same space into 
> itself, so it is composable, both pre and post.
> I'd propose we add:
> {code}
>   UnivariateRealFunction plus(UnivariateRealFunction other);
>   UnivariateRealFunction minus(UnivariateRealFunction other);
>   UnivariateRealFunction times(UnivariateRealFunction other);
>   UnivariateRealFunction times(double scale);
>   UnivariateRealFunction preCompose(UnivariateRealFunction other);
>   UnivariateRealFunction postCompose(UnivariateRealFunction other);
> {code}
> to the interface, and then implement them in an 
> AbstractUnivariateRealFunction base class.  No implementer would need to 
> notice, other than switching to extend this class rather than implement 
> UnivariateRealFunction.
> Many people don't need or use this, but... it makes for some powerfully easy 
> code:
> {code}UnivariateRealFunction gaussian = 
> Exp.preCompose(Negate.preCompose(Pow2));{code}
> which is even nicer when done anonymously passing into a map/collect method 
> (a la MATH-312).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to