Repository: systemml Updated Branches: refs/heads/master 62647de61 -> bc6e941ce
[SYSTEMML-2499] Built-in functions for binomial distribution Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/bc6e941c Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/bc6e941c Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/bc6e941c Branch: refs/heads/master Commit: bc6e941ce1b4cd17027e3f55c0b89013a3d0a801 Parents: 62647de Author: Berthold Reinwald <[email protected]> Authored: Thu Nov 29 17:32:19 2018 -0800 Committer: Berthold Reinwald <[email protected]> Committed: Thu Nov 29 17:32:19 2018 -0800 ---------------------------------------------------------------------- docs/dml-language-reference.md | 13 +- .../org/apache/sysml/parser/DMLTranslator.java | 6 + .../org/apache/sysml/parser/Expression.java | 2 +- .../ParameterizedBuiltinFunctionExpression.java | 9 + .../functionobjects/ParameterizedBuiltin.java | 444 +++++++++++-------- .../unary/scalar/FullDistributionTest.java | 22 +- .../functions/unary/scalar/DFTest_BINOMIAL.R | 35 ++ .../functions/unary/scalar/DFTest_BINOMIAL.dml | 45 ++ 8 files changed, 381 insertions(+), 195 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/docs/dml-language-reference.md ---------------------------------------------------------------------- diff --git a/docs/dml-language-reference.md b/docs/dml-language-reference.md index cdcc529..6f1c854 100644 --- a/docs/dml-language-reference.md +++ b/docs/dml-language-reference.md @@ -691,7 +691,7 @@ moment() | Returns the kth central moment of values in a column matrix V, where colSums() <br/> colMeans() <br/> colVars() <br/> colSds() <br/> colMaxs() <br/> colMins() | Column-wise computations -- for each column, compute the sum/mean/variance/stdDev/max/min of cell values | Input: matrix <br/> Output: (1 x n) matrix | colSums(X) <br/> colMeans(X) <br/> colVars(X) <br/> colSds(X) <br/> colMaxs(X) <br/>colMins(X) cov() | Returns the covariance between two 1-dimensional column matrices X and Y. The function takes an optional weights parameter W. All column matrices X, Y, and W (when specified) must have the exact same dimension. | Input: (X <(n x 1) matrix>, Y <(n x 1) matrix> [, W <(n x 1) matrix>)]) <br/> Output: <scalar> | cov(X,Y) <br/> cov(X,Y,W) table() | Returns the contingency table of two vectors A and B. The resulting table F consists of max(A) rows and max(B) columns. <br/> More precisely, F[i,j] = \\|{ k \\| A[k] = i and B[k] = j, 1 ⤠k ⤠n }\\|, where A and B are two n-dimensional vectors. <br/> This function supports multiple other variants, which can be found below, at the end of this Table 7. | Input: (<(n x 1) matrix>, <(n x 1) matrix>), [<(n x 1) matrix>]) <br/> Output: <matrix> | F = table(A, B) <br/> F = table(A, B, C) <br/> And, several other forms (see below Table 7.) -cdf()<br/> pnorm()<br/> pexp()<br/> pchisq()<br/> pf()<br/> pt()<br/> icdf()<br/> qnorm()<br/> qexp()<br/> qchisq()<br/> qf()<br/> qt() | p=cdf(target=q, ...) returns the cumulative probability P[X <= q]. <br/> q=icdf(target=p, ...) returns the inverse cumulative probability i.e., it returns q such that the given target p = P[X<=q]. <br/> For more details, please see the section "Probability Distribution Functions" below Table 7. | Input: (target=<scalar>, dist="...", ...) <br/> Output: <scalar> | p = cdf(target=q, dist="normal", mean=1.5, sd=2); is same as p=pnorm(target=q, mean=1.5, sd=2); <br/> q=icdf(target=p, dist="normal") is same as q=qnorm(target=p, mean=0,sd=1) <br/> More examples can be found in the section "Probability Distribution Functions" below Table 7. +cdf()<br/> pnorm()<br/> pbinomial()<br/>pexp()<br/> pchisq()<br/> pf()<br/> pt()<br/> icdf()<br/> qnorm()<br/> qbinomial()<br/>qexp()<br/> qchisq()<br/> qf()<br/> qt() | p=cdf(target=q, ...) returns the cumulative probability P[X <= q]. <br/> q=icdf(target=p, ...) returns the inverse cumulative probability i.e., it returns q such that the given target p = P[X<=q]. <br/> For more details, please see the section "Probability Distribution Functions" below Table 7. | Input: (target=<scalar>, dist="...", ...) <br/> Output: <scalar> | p = cdf(target=q, dist="normal", mean=1.5, sd=2); is same as p=pnorm(target=q, mean=1.5, sd=2); <br/> q=icdf(target=p, dist="normal") is same as q=qnorm(target=p, mean=0,sd=1) <br/> More examples can be found in the section "Probability Distribution Functions" below Table 7. aggregate() | Splits/groups the values from X according to the corresponding values from G, and then applies the function fn on each group. <br/> The result F is a column matrix, in which each row contains the value computed from a distinct group in G. More specifically, F[k,1] = fn( {X[i,1] \\| 1<=i<=n and G[i,1] = k} ), where n = nrow(X) = nrow(G). <br/> Note that the distinct values in G are used as row indexes in the result matrix F. Therefore, nrow(F) = max(G). It is thus recommended that the values in G are consecutive and start from 1. <br/> This function supports multiple other variants, which can be found below, at the end of this Table 7. | Input:<br/> (target = X <(n x 1) matrix, or matrix>,<br/> groups = G <(n x 1) matrix>,<br/> fn= "..." <br/> [,weights= W<(n x 1) matrix>] <br/> [,ngroups=N] )<br/>Output: F <matrix> <br/> Note: X is a (n x 1) matrix unless ngroups is sp ecified with no weights, in which case X is a regular (n x m) matrix.<br/> The parameter fn takes one of the following functions: "count", "sum", "mean", "variance", "centralmoment". In the case of central moment, one must also provide the order of the moment that need to be computed (see example). | F = aggregate(target=X, groups=G, fn= "..." [,weights = W]) <br/> F = aggregate(target=X, groups=G1, fn= "sum"); <br/> F = aggregate(target=Y, groups=G2, fn= "mean", weights=W); <br/> F = aggregate(target=Z, groups=G3, fn= "centralmoment", order= "2"); <br/> And, several other forms (see below Table 7.) interQuartileMean() | Returns the mean of all x in X such that x>quantile(X, 0.25) and x<=quantile(X, 0.75). X, W are column matrices (vectors) of the same size. W contains the weights for data in X. | Input: (X <(n x 1) matrix> [, W <(n x 1) matrix>)]) <br/> Output: <scalar> | interQuartileMean(X) <br/> interQuartileMean(X, W) quantile () | The p-quantile for a random variable X is the value x such that Pr[X<x] <= p and Pr[X<= x] >= p <br/> let n=nrow(X), i=ceiling(p*n), quantile() will return X[i]. p is a scalar (0<p<1) that specifies the quantile to be computed. Optionally, a weight vector may be provided for X. | Input: (X <(n x 1) matrix>, [W <(n x 1) matrix>),] p <scalar>) <br/> Output: <scalar> | quantile(X, p) <br/> quantile(X, W, p) @@ -749,6 +749,7 @@ This computes the cumulative probability at the given quantile i.e., P[X<=q], * `dist`: name of the distribution specified as a string. Valid values are "normal" (for Normal or Gaussian distribution), "f" (for F distribution), "t" (for Student t-distribution), "chisq" (for Chi Squared distribution), and "exp" (for Exponential distribution). This is a mandatory argument. * `...`: parameters of the distribution * For `dist="normal"`, valid parameters are mean and sd that specify the mean and standard deviation of the normal distribution. The default values for mean and sd are 0.0 and 1.0, respectively. + * For `dist="binomial"`, valid parameters are trials and p that specify the number of trials and probability of success. Both parameters are mandatory. * For `dist="f"`, valid parameters are df1 and df2 that specify two degrees of freedom. Both these parameters are mandatory. * For `dist="t"`, and dist="chisq", valid parameter is df that specifies the degrees of freedom. This parameter is mandatory. * For `dist="exp"`, valid parameter is rate that specifies the rate at which events occur. Note that the mean of exponential distribution is 1.0/rate. The default value is 1.0. @@ -763,7 +764,7 @@ This computes the inverse cumulative probability i.e., it computes a quantile q * `dist`: name of the distribution specified as a string. Same as that in cdf(). * `...`: parameters of the distribution. Same as those in cdf(). -Alternative to `cdf()` and `icdf()`, users can also use distribution-specific functions. The functions `pnorm()`, `pf()`, `pt()`, `pchisq()`, and `pexp()` computes the cumulative probabilities for Normal, F, t, Chi Squared, and Exponential distributions, respectively. Appropriate distribution parameters must be provided for each function. Similarly, `qnorm()`, `qf()`, `qt()`, `qchisq()`, and `qexp()` compute the inverse cumulative probabilities for Normal, F, t, Chi Squared, and Exponential distributions. +Alternative to `cdf()` and `icdf()`, users can also use distribution-specific functions. The functions `pnorm()`, `pbinomial()`, `pf()`, `pt()`, `pchisq()`, and `pexp()` computes the cumulative probabilities for Normal, Binomial, F, t, Chi Squared, and Exponential distributions, respectively. Appropriate distribution parameters must be provided for each function. Similarly, `qnorm()`, `qbinomial()`, `qf()`, `qt()`, `qchisq()`, and `qexp()` compute the inverse cumulative probabilities for Normal, Binomial, F, t, Chi Squared, and Exponential distributions. Following pairs of DML statements are equivalent. @@ -771,6 +772,10 @@ Following pairs of DML statements are equivalent. is same as `p=pnorm(target=q, mean=1.5, sd=2);` +`p = cdf(target=q, dist="binomial", trials=20, p=0.25);` +is same as +`p=pbinomial(target=q, trials=20, p=0.25);` + `p = cdf(target=q, dist="exp", rate=5);` is same as `pexp(target=q,rate=5);` @@ -801,6 +806,10 @@ Examples of icdf(): is same as `q=qnorm(target=p, mean=0,sd=1);` +`q=icdf(target=p, dist="binomial", trials=20, p=0.25);` +is same as +`q=qbinomial(target=p, trials=20, p=0.25);` + `q=icdf(target=p, dist="exp");` is same as `q=qexp(target=p, rate=1);` http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/src/main/java/org/apache/sysml/parser/DMLTranslator.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/parser/DMLTranslator.java b/src/main/java/org/apache/sysml/parser/DMLTranslator.java index e3db435..7a71440 100644 --- a/src/main/java/org/apache/sysml/parser/DMLTranslator.java +++ b/src/main/java/org/apache/sysml/parser/DMLTranslator.java @@ -2016,6 +2016,10 @@ public class DMLTranslator case CDF: case INVCDF: break; + case QBINOMIAL: + case PBINOMIAL: + distLop = new LiteralOp("binomial"); + break; default: throw new HopsException("Invalid operation: " + op); @@ -2099,11 +2103,13 @@ public class DMLTranslator switch(source.getOpCode()) { case CDF: case INVCDF: + case QBINOMIAL: case QNORM: case QT: case QF: case QCHISQ: case QEXP: + case PBINOMIAL: case PNORM: case PT: case PF: http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/src/main/java/org/apache/sysml/parser/Expression.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/parser/Expression.java b/src/main/java/org/apache/sysml/parser/Expression.java index 33fca66..2e765d3 100644 --- a/src/main/java/org/apache/sysml/parser/Expression.java +++ b/src/main/java/org/apache/sysml/parser/Expression.java @@ -159,7 +159,7 @@ public abstract class Expression implements ParseInfo public enum ParameterizedBuiltinFunctionOp { GROUPEDAGG, RMEMPTY, REPLACE, ORDER, LOWER_TRI, UPPER_TRI, // Distribution Functions - CDF, INVCDF, PNORM, QNORM, PT, QT, PF, QF, PCHISQ, QCHISQ, PEXP, QEXP, + CDF, INVCDF, PNORM, QNORM, PT, QT, PF, QF, PCHISQ, QCHISQ, PEXP, QEXP, PBINOMIAL, QBINOMIAL, TRANSFORMAPPLY, TRANSFORMDECODE, TRANSFORMENCODE, TRANSFORMCOLMAP, TRANSFORMMETA, TOSTRING, // The "toString" method for DML; named arguments accepted to format output LIST, // named argument lists; unnamed lists become builtin function http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/src/main/java/org/apache/sysml/parser/ParameterizedBuiltinFunctionExpression.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/parser/ParameterizedBuiltinFunctionExpression.java b/src/main/java/org/apache/sysml/parser/ParameterizedBuiltinFunctionExpression.java index 7c516e6..fdd4f23 100644 --- a/src/main/java/org/apache/sysml/parser/ParameterizedBuiltinFunctionExpression.java +++ b/src/main/java/org/apache/sysml/parser/ParameterizedBuiltinFunctionExpression.java @@ -65,6 +65,7 @@ public class ParameterizedBuiltinFunctionExpression extends DataIdentifier opcodeMap.put("pf", Expression.ParameterizedBuiltinFunctionOp.PF); opcodeMap.put("pchisq", Expression.ParameterizedBuiltinFunctionOp.PCHISQ); opcodeMap.put("pexp", Expression.ParameterizedBuiltinFunctionOp.PEXP); + opcodeMap.put("pbinomial", Expression.ParameterizedBuiltinFunctionOp.PBINOMIAL); opcodeMap.put("icdf", Expression.ParameterizedBuiltinFunctionOp.INVCDF); opcodeMap.put("qnorm", Expression.ParameterizedBuiltinFunctionOp.QNORM); @@ -72,6 +73,7 @@ public class ParameterizedBuiltinFunctionExpression extends DataIdentifier opcodeMap.put("qf", Expression.ParameterizedBuiltinFunctionOp.QF); opcodeMap.put("qchisq", Expression.ParameterizedBuiltinFunctionOp.QCHISQ); opcodeMap.put("qexp", Expression.ParameterizedBuiltinFunctionOp.QEXP); + opcodeMap.put("qbinomial", Expression.ParameterizedBuiltinFunctionOp.QBINOMIAL); // data transformation functions opcodeMap.put("transformapply", Expression.ParameterizedBuiltinFunctionOp.TRANSFORMAPPLY); @@ -107,6 +109,8 @@ public class ParameterizedBuiltinFunctionExpression extends DataIdentifier pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.PF, ParamBuiltinOp.CDF); pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.PCHISQ, ParamBuiltinOp.CDF); pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.PEXP, ParamBuiltinOp.CDF); + pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.PBINOMIAL, ParamBuiltinOp.CDF); + pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.INVCDF, ParamBuiltinOp.INVCDF); pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.QNORM, ParamBuiltinOp.INVCDF); @@ -114,6 +118,7 @@ public class ParameterizedBuiltinFunctionExpression extends DataIdentifier pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.QF, ParamBuiltinOp.INVCDF); pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.QCHISQ, ParamBuiltinOp.INVCDF); pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.QEXP, ParamBuiltinOp.INVCDF); + pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.QBINOMIAL, ParamBuiltinOp.INVCDF); // toString pbHopMap.put(Expression.ParameterizedBuiltinFunctionOp.TOSTRING, ParamBuiltinOp.TOSTRING); @@ -225,6 +230,8 @@ public class ParameterizedBuiltinFunctionExpression extends DataIdentifier case QCHISQ: case PEXP: case QEXP: + case PBINOMIAL: + case QBINOMIAL: validateDistributionFunctions(output, conditional); break; @@ -784,6 +791,7 @@ public class ParameterizedBuiltinFunctionExpression extends DataIdentifier switch(op) { case INVCDF: case QNORM: + case QBINOMIAL: case QF: case QT: case QCHISQ: @@ -795,6 +803,7 @@ public class ParameterizedBuiltinFunctionExpression extends DataIdentifier case CDF: case PNORM: + case PBINOMIAL: case PF: case PT: case PCHISQ: http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/src/main/java/org/apache/sysml/runtime/functionobjects/ParameterizedBuiltin.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/sysml/runtime/functionobjects/ParameterizedBuiltin.java b/src/main/java/org/apache/sysml/runtime/functionobjects/ParameterizedBuiltin.java index 7f4345e..5662f91 100644 --- a/src/main/java/org/apache/sysml/runtime/functionobjects/ParameterizedBuiltin.java +++ b/src/main/java/org/apache/sysml/runtime/functionobjects/ParameterizedBuiltin.java @@ -22,301 +22,363 @@ package org.apache.sysml.runtime.functionobjects; import java.util.HashMap; import org.apache.commons.math3.distribution.AbstractRealDistribution; +import org.apache.commons.math3.distribution.AbstractIntegerDistribution; import org.apache.commons.math3.distribution.ChiSquaredDistribution; import org.apache.commons.math3.distribution.ExponentialDistribution; import org.apache.commons.math3.distribution.FDistribution; import org.apache.commons.math3.distribution.NormalDistribution; +import org.apache.commons.math3.distribution.BinomialDistribution; import org.apache.commons.math3.distribution.TDistribution; import org.apache.sysml.runtime.DMLRuntimeException; import org.apache.sysml.runtime.util.UtilFunctions; - /** - * Function object for builtin function that takes a list of name=value parameters. - * This class can not be instantiated elsewhere. + * Function object for builtin function that takes a list of name=value + * parameters. This class can not be instantiated elsewhere. */ - -public class ParameterizedBuiltin extends ValueFunction -{ +public class ParameterizedBuiltin extends ValueFunction { private static final long serialVersionUID = -5966242955816522697L; - - public enum ParameterizedBuiltinCode { - CDF, INVCDF, RMEMPTY, REPLACE, REXPAND, LOWER_TRI, UPPER_TRI, - TRANSFORMAPPLY, TRANSFORMDECODE, PARAMSERV } - public enum ProbabilityDistributionCode { - INVALID, NORMAL, EXP, CHISQ, F, T } - + + public enum ParameterizedBuiltinCode { + CDF, INVCDF, RMEMPTY, REPLACE, REXPAND, LOWER_TRI, UPPER_TRI, TRANSFORMAPPLY, TRANSFORMDECODE, PARAMSERV + } + + public enum ProbabilityDistributionCode { + INVALID, NORMAL, EXP, CHISQ, F, T, BINOMIAL + } + public ParameterizedBuiltinCode bFunc; public ProbabilityDistributionCode distFunc; - + static public HashMap<String, ParameterizedBuiltinCode> String2ParameterizedBuiltinCode; static { String2ParameterizedBuiltinCode = new HashMap<>(); - String2ParameterizedBuiltinCode.put( "cdf", ParameterizedBuiltinCode.CDF); - String2ParameterizedBuiltinCode.put( "invcdf", ParameterizedBuiltinCode.INVCDF); - String2ParameterizedBuiltinCode.put( "rmempty", ParameterizedBuiltinCode.RMEMPTY); - String2ParameterizedBuiltinCode.put( "replace", ParameterizedBuiltinCode.REPLACE); - String2ParameterizedBuiltinCode.put( "lowertri", ParameterizedBuiltinCode.LOWER_TRI); - String2ParameterizedBuiltinCode.put( "uppertri", ParameterizedBuiltinCode.UPPER_TRI); - String2ParameterizedBuiltinCode.put( "rexpand", ParameterizedBuiltinCode.REXPAND); - String2ParameterizedBuiltinCode.put( "transformapply", ParameterizedBuiltinCode.TRANSFORMAPPLY); - String2ParameterizedBuiltinCode.put( "transformdecode", ParameterizedBuiltinCode.TRANSFORMDECODE); - String2ParameterizedBuiltinCode.put( "paramserv", ParameterizedBuiltinCode.PARAMSERV); + String2ParameterizedBuiltinCode.put("cdf", ParameterizedBuiltinCode.CDF); + String2ParameterizedBuiltinCode.put("invcdf", ParameterizedBuiltinCode.INVCDF); + String2ParameterizedBuiltinCode.put("rmempty", ParameterizedBuiltinCode.RMEMPTY); + String2ParameterizedBuiltinCode.put("replace", ParameterizedBuiltinCode.REPLACE); + String2ParameterizedBuiltinCode.put("lowertri", ParameterizedBuiltinCode.LOWER_TRI); + String2ParameterizedBuiltinCode.put("uppertri", ParameterizedBuiltinCode.UPPER_TRI); + String2ParameterizedBuiltinCode.put("rexpand", ParameterizedBuiltinCode.REXPAND); + String2ParameterizedBuiltinCode.put("transformapply", ParameterizedBuiltinCode.TRANSFORMAPPLY); + String2ParameterizedBuiltinCode.put("transformdecode", ParameterizedBuiltinCode.TRANSFORMDECODE); + String2ParameterizedBuiltinCode.put("paramserv", ParameterizedBuiltinCode.PARAMSERV); } - + static public HashMap<String, ProbabilityDistributionCode> String2DistCode; static { String2DistCode = new HashMap<>(); - String2DistCode.put("normal" , ProbabilityDistributionCode.NORMAL); - String2DistCode.put("exp" , ProbabilityDistributionCode.EXP); - String2DistCode.put("chisq" , ProbabilityDistributionCode.CHISQ); - String2DistCode.put("f" , ProbabilityDistributionCode.F); - String2DistCode.put("t" , ProbabilityDistributionCode.T); + String2DistCode.put("normal", ProbabilityDistributionCode.NORMAL); + String2DistCode.put("exp", ProbabilityDistributionCode.EXP); + String2DistCode.put("chisq", ProbabilityDistributionCode.CHISQ); + String2DistCode.put("f", ProbabilityDistributionCode.F); + String2DistCode.put("t", ProbabilityDistributionCode.T); + String2DistCode.put("binomial", ProbabilityDistributionCode.BINOMIAL); } - + // We should create one object for every builtin function that we support - private static ParameterizedBuiltin normalObj = null, expObj = null, chisqObj = null, fObj = null, tObj = null; - private static ParameterizedBuiltin inormalObj = null, iexpObj = null, ichisqObj = null, ifObj = null, itObj = null; - + private static ParameterizedBuiltin normalObj = null, expObj = null, chisqObj = null, fObj = null, tObj = null, + binomialObj = null; + private static ParameterizedBuiltin inormalObj = null, iexpObj = null, ichisqObj = null, ifObj = null, itObj = null, ibinomialObj; + private ParameterizedBuiltin(ParameterizedBuiltinCode bf) { bFunc = bf; distFunc = ProbabilityDistributionCode.INVALID; } - + private ParameterizedBuiltin(ParameterizedBuiltinCode bf, ProbabilityDistributionCode dist) { bFunc = bf; distFunc = dist; } - public static ParameterizedBuiltin getParameterizedBuiltinFnObject (String str) { - return getParameterizedBuiltinFnObject (str, null); + public static ParameterizedBuiltin getParameterizedBuiltinFnObject(String str) { + return getParameterizedBuiltinFnObject(str, null); } - public static ParameterizedBuiltin getParameterizedBuiltinFnObject (String str, String str2) { - + public static ParameterizedBuiltin getParameterizedBuiltinFnObject(String str, String str2) { + ParameterizedBuiltinCode code = String2ParameterizedBuiltinCode.get(str); - - switch ( code ) - { - case CDF: - // str2 will point the appropriate distribution - ProbabilityDistributionCode dcode = String2DistCode.get(str2.toLowerCase()); - - switch(dcode) { - case NORMAL: - if ( normalObj == null ) - normalObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); - return normalObj; - case EXP: - if ( expObj == null ) - expObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); - return expObj; - case CHISQ: - if ( chisqObj == null ) - chisqObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); - return chisqObj; - case F: - if ( fObj == null ) - fObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); - return fObj; - case T: - if ( tObj == null ) - tObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); - return tObj; - default: - throw new DMLRuntimeException("Invalid distribution code: " + dcode); - } - - case INVCDF: - // str2 will point the appropriate distribution - ProbabilityDistributionCode distcode = String2DistCode.get(str2.toLowerCase()); - - switch(distcode) { - case NORMAL: - if ( inormalObj == null ) - inormalObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); - return inormalObj; - case EXP: - if ( iexpObj == null ) - iexpObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); - return iexpObj; - case CHISQ: - if ( ichisqObj == null ) - ichisqObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); - return ichisqObj; - case F: - if ( ifObj == null ) - ifObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); - return ifObj; - case T: - if ( itObj == null ) - itObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); - return itObj; - default: - throw new DMLRuntimeException("Invalid distribution code: " + distcode); - } - - case RMEMPTY: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.RMEMPTY); - - case REPLACE: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.REPLACE); - - case LOWER_TRI: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.LOWER_TRI); - - case UPPER_TRI: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.UPPER_TRI); - - case REXPAND: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.REXPAND); - - case TRANSFORMAPPLY: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.TRANSFORMAPPLY); - - case TRANSFORMDECODE: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.TRANSFORMDECODE); - - case PARAMSERV: - return new ParameterizedBuiltin(ParameterizedBuiltinCode.PARAMSERV); - + + switch (code) { + case CDF: + // str2 will point the appropriate distribution + ProbabilityDistributionCode dcode = String2DistCode.get(str2.toLowerCase()); + + switch (dcode) { + case NORMAL: + if (normalObj == null) + normalObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); + return normalObj; + case EXP: + if (expObj == null) + expObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); + return expObj; + case CHISQ: + if (chisqObj == null) + chisqObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); + return chisqObj; + case F: + if (fObj == null) + fObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); + return fObj; + case T: + if (tObj == null) + tObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); + return tObj; + case BINOMIAL: + if (binomialObj == null) + binomialObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.CDF, dcode); + return binomialObj; + default: + throw new DMLRuntimeException("Invalid distribution code: " + dcode); + } + + case INVCDF: + // str2 will point the appropriate distribution + ProbabilityDistributionCode distcode = String2DistCode.get(str2.toLowerCase()); + + switch (distcode) { + case NORMAL: + if (inormalObj == null) + inormalObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); + return inormalObj; + case EXP: + if (iexpObj == null) + iexpObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); + return iexpObj; + case CHISQ: + if (ichisqObj == null) + ichisqObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); + return ichisqObj; + case F: + if (ifObj == null) + ifObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); + return ifObj; + case T: + if (itObj == null) + itObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); + return itObj; + case BINOMIAL: + if (ibinomialObj == null) + ibinomialObj = new ParameterizedBuiltin(ParameterizedBuiltinCode.INVCDF, distcode); + return ibinomialObj; default: - throw new DMLRuntimeException("Invalid parameterized builtin code: " + code); + throw new DMLRuntimeException("Invalid distribution code: " + distcode); + } + + case RMEMPTY: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.RMEMPTY); + + case REPLACE: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.REPLACE); + + case LOWER_TRI: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.LOWER_TRI); + + case UPPER_TRI: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.UPPER_TRI); + + case REXPAND: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.REXPAND); + + case TRANSFORMAPPLY: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.TRANSFORMAPPLY); + + case TRANSFORMDECODE: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.TRANSFORMDECODE); + + case PARAMSERV: + return new ParameterizedBuiltin(ParameterizedBuiltinCode.PARAMSERV); + + default: + throw new DMLRuntimeException("Invalid parameterized builtin code: " + code); } } - + @Override - public double execute(HashMap<String,String> params) { - switch(bFunc) { + public double execute(HashMap<String, String> params) { + switch (bFunc) { case CDF: case INVCDF: - switch(distFunc) { + switch (distFunc) { case NORMAL: case EXP: case CHISQ: case F: case T: - return computeFromDistribution(distFunc, params, (bFunc==ParameterizedBuiltinCode.INVCDF)); + case BINOMIAL: + return computeFromDistribution(distFunc, params, (bFunc == ParameterizedBuiltinCode.INVCDF)); default: - throw new DMLRuntimeException("Unsupported distribution (" + distFunc + ")."); + throw new DMLRuntimeException("Unsupported distribution (" + distFunc + ")."); } - + default: throw new DMLRuntimeException("ParameterizedBuiltin.execute(): Unknown operation: " + bFunc); } } - + /** - * Helper function to compute distribution-specific cdf (both lowertail and uppertail) and inverse cdf. + * Helper function to compute distribution-specific cdf (both lowertail and + * uppertail) and inverse cdf. * - * @param dcode probablility distribution code - * @param params map of parameters + * @param dcode probablility distribution code + * @param params map of parameters * @param inverse true if inverse * @return cdf or inverse cdf */ - private static double computeFromDistribution (ProbabilityDistributionCode dcode, HashMap<String,String> params, boolean inverse ) { - - // given value is "quantile" when inverse=false, and it is "probability" when inverse=true + private static double computeFromDistribution(ProbabilityDistributionCode dcode, HashMap<String, String> params, + boolean inverse) { + + // given value is "quantile" when inverse=false, and it is "probability" when + // inverse=true double val = Double.parseDouble(params.get("target")); - + boolean lowertail = true; - if(params.get("lower.tail") != null) { + if (params.get("lower.tail") != null) { lowertail = Boolean.parseBoolean(params.get("lower.tail")); } - + AbstractRealDistribution distFunction = null; - - switch(dcode) { + AbstractIntegerDistribution distIntegerFunction = null; + + switch (dcode) { case NORMAL: - + double mean = 0.0, sd = 1.0; // default values for mean and sd - + String mean_s = params.get("mean"), sd_s = params.get("sd"); - if(mean_s != null) mean = Double.parseDouble(mean_s); - if(sd_s != null) sd = Double.parseDouble(sd_s); - - if ( sd <= 0 ) - throw new DMLRuntimeException("Standard deviation for Normal distribution must be positive (" + sd + ")"); - + if (mean_s != null) + mean = Double.parseDouble(mean_s); + if (sd_s != null) + sd = Double.parseDouble(sd_s); + + if (sd <= 0) + throw new DMLRuntimeException( + "Standard deviation for Normal distribution must be positive (" + sd + ")"); + distFunction = new NormalDistribution(mean, sd); break; - + case EXP: double exp_rate = 1.0; // default value for 1/mean or rate - - if(params.get("rate") != null) exp_rate = Double.parseDouble(params.get("rate")); - if ( exp_rate <= 0 ) { + + if (params.get("rate") != null) + exp_rate = Double.parseDouble(params.get("rate")); + if (exp_rate <= 0) { throw new DMLRuntimeException("Rate for Exponential distribution must be positive (" + exp_rate + ")"); } // For exponential distribution: mean = 1/rate - distFunction = new ExponentialDistribution(1.0/exp_rate); + distFunction = new ExponentialDistribution(1.0 / exp_rate); break; - + case CHISQ: - if ( params.get("df") == null ) { - throw new DMLRuntimeException("" + - "Degrees of freedom must be specified for chi-squared distribution " + - "(e.g., q=qchisq(0.5, df=20); p=pchisq(target=q, df=1.2))"); + if (params.get("df") == null) { + throw new DMLRuntimeException("" + "Degrees of freedom must be specified for chi-squared distribution " + + "(e.g., q=qchisq(0.5, df=20); p=pchisq(target=q, df=1.2))"); } int df = UtilFunctions.parseToInt(params.get("df")); - - if ( df <= 0 ) { - throw new DMLRuntimeException("Degrees of Freedom for chi-squared distribution must be positive (" + df + ")"); + + if (df <= 0) { + throw new DMLRuntimeException( + "Degrees of Freedom for chi-squared distribution must be positive (" + df + ")"); } distFunction = new ChiSquaredDistribution(df); break; - + case F: - if ( params.get("df1") == null || params.get("df2") == null ) { - throw new DMLRuntimeException("" + - "Degrees of freedom must be specified for F distribution " + - "(e.g., q = qf(target=0.5, df1=20, df2=30); p=pf(target=q, df1=20, df2=30))"); + if (params.get("df1") == null || params.get("df2") == null) { + throw new DMLRuntimeException("" + "Degrees of freedom must be specified for F distribution " + + "(e.g., q = qf(target=0.5, df1=20, df2=30); p=pf(target=q, df1=20, df2=30))"); } int df1 = UtilFunctions.parseToInt(params.get("df1")); int df2 = UtilFunctions.parseToInt(params.get("df2")); - if ( df1 <= 0 || df2 <= 0) { - throw new DMLRuntimeException("Degrees of Freedom for F distribution must be positive (" + df1 + "," + df2 + ")"); + if (df1 <= 0 || df2 <= 0) { + throw new DMLRuntimeException( + "Degrees of Freedom for F distribution must be positive (" + df1 + "," + df2 + ")"); } distFunction = new FDistribution(df1, df2); break; - + case T: - if ( params.get("df") == null ) { - throw new DMLRuntimeException("" + - "Degrees of freedom is needed to compute probabilities from t distribution " + - "(e.g., q = qt(target=0.5, df=10); p = pt(target=q, df=10))"); + if (params.get("df") == null) { + throw new DMLRuntimeException( + "" + "Degrees of freedom is needed to compute probabilities from t distribution " + + "(e.g., q = qt(target=0.5, df=10); p = pt(target=q, df=10))"); } int t_df = UtilFunctions.parseToInt(params.get("df")); - if ( t_df <= 0 ) { + if (t_df <= 0) { throw new DMLRuntimeException("Degrees of Freedom for t distribution must be positive (" + t_df + ")"); } distFunction = new TDistribution(t_df); break; - + + case BINOMIAL: + try { + if (!inverse) + Integer.parseInt(params.get("target")); + } catch (NumberFormatException e) { + throw new DMLRuntimeException( + "" + "Target needs to be an integer " + "(e.g., p=pbinomial(target=1, trials=10, p=0.3))("+val+")"); + } + + int trials; + if (params.get("trials") == null) { + throw new DMLRuntimeException("" + "Number of trials must be specified for binomial distribution " + + "(e.g., p=pbinomial(target=q, trials=10))"); + } + + try { + trials = Integer.parseInt(params.get("trials")); + } catch (NumberFormatException e) { + throw new DMLRuntimeException( + "" + "trials needs to be an integer " + "(e.g., p=pbinomial(target=1, trials=10, p=0.3)"); + } + + if (trials < 0) { + throw new DMLRuntimeException( + "Number of trials must be positive (NotPositiveException - if trials < 0) (" + trials + ")"); + } + + if (params.get("p") == null) { + throw new DMLRuntimeException("" + "Probability of success must be specified for binomial distribution " + + "(e.g., p=pbinomial(target=1, trials=10, p=0.3))"); + } + + double p = UtilFunctions.parseToDouble(params.get("p")); + if (p < 0 || p > 1) { + throw new DMLRuntimeException( + "" + "Probability of success must be 0<=p<=1 (OutOfRangeException) (" + p + ")"); + } + + distIntegerFunction = new BinomialDistribution(trials, p); + break; + default: throw new DMLRuntimeException("Invalid distribution code: " + dcode); } - + double ret = Double.NaN; - if(inverse) { + if (inverse) { // inverse cdf - ret = distFunction.inverseCumulativeProbability(val); - } - else if(lowertail) { + ret = (distIntegerFunction == null) ? distFunction.inverseCumulativeProbability(val) + : distIntegerFunction.inverseCumulativeProbability(val); + } else if (lowertail) { // cdf (lowertail) - ret = distFunction.cumulativeProbability(val); - } - else { + ret = (distIntegerFunction == null) ? distFunction.cumulativeProbability(val) + : distIntegerFunction.cumulativeProbability((int) val); + } else { // cdf (upper tail) - - // TODO: more accurate distribution-specific computation of upper tail probabilities - ret = 1.0 - distFunction.cumulativeProbability(val); + + // TODO: more accurate distribution-specific computation of upper tail + // probabilities + ret = 1.0 - ((distIntegerFunction == null) ? distFunction.cumulativeProbability(val) + : distIntegerFunction.cumulativeProbability((int) val)); } - + return ret; } } http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/src/test/java/org/apache/sysml/test/integration/functions/unary/scalar/FullDistributionTest.java ---------------------------------------------------------------------- diff --git a/src/test/java/org/apache/sysml/test/integration/functions/unary/scalar/FullDistributionTest.java b/src/test/java/org/apache/sysml/test/integration/functions/unary/scalar/FullDistributionTest.java index 92707a0..a3ef11f 100644 --- a/src/test/java/org/apache/sysml/test/integration/functions/unary/scalar/FullDistributionTest.java +++ b/src/test/java/org/apache/sysml/test/integration/functions/unary/scalar/FullDistributionTest.java @@ -44,7 +44,7 @@ public class FullDistributionTest extends AutomatedTestBase private enum TEST_TYPE { NORMAL, NORMAL_NOPARAMS, NORMAL_MEAN, - NORMAL_SD, F, T, CHISQ, EXP, EXP_NOPARAMS + NORMAL_SD, BINOMIAL, F, T, CHISQ, EXP, EXP_NOPARAMS } @@ -75,6 +75,21 @@ public class FullDistributionTest extends AutomatedTestBase } @Test + public void testBinomiaCP() { + runDFTest(TEST_TYPE.BINOMIAL, true, 10.0, null, ExecType.CP); + } + + @Test + public void testBinomiaSpark() { + runDFTest(TEST_TYPE.BINOMIAL, true, 10.0, null, ExecType.SPARK); + } + + @Test + public void testBinomiaMR() { + runDFTest(TEST_TYPE.BINOMIAL, true, 10.0, null, ExecType.MR); + } + + @Test public void testTCP() { runDFTest(TEST_TYPE.T, true, 10.0, null, ExecType.CP); } @@ -223,6 +238,11 @@ public class FullDistributionTest extends AutomatedTestBase rCmd = "Rscript" + " " + fullRScriptName + " " + Double.toString(in) + " " + Double.toString(param1) + " " + Double.toString(param2) + " " + expected("dfout"); break; + case BINOMIAL: + programArgs = new String[]{"-args", Double.toString(in), Integer.toString(param1.intValue()), output("dfout") }; + rCmd = "Rscript" + " " + fullRScriptName + " " + Double.toString(in) + " " + Integer.toString(param1.intValue()) + " " + expected("dfout"); + break; + default: throw new RuntimeException("Invalid distribution function: " + type); } http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.R ---------------------------------------------------------------------- diff --git a/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.R b/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.R new file mode 100644 index 0000000..e049d34 --- /dev/null +++ b/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.R @@ -0,0 +1,35 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +args <- commandArgs(TRUE) +library(Matrix) + +t1 = as.numeric(args[1]) +t2 = as.numeric(args[2]) +t3 = args[3] + +p = pbinom(q=t2, size=20, prob=0.25, lower.tail=TRUE) +pl = pbinom(q=t2, size=20, prob=0.25, lower.tail=FALSE) +q = qbinom(p=t1, size=20, prob=0.25) + +res = rbind(as.matrix(p), as.matrix(pl), as.matrix(as.double(q))) + +writeMM(as(res, "CsparseMatrix"), t3) http://git-wip-us.apache.org/repos/asf/systemml/blob/bc6e941c/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.dml ---------------------------------------------------------------------- diff --git a/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.dml b/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.dml new file mode 100644 index 0000000..3baa69b --- /dev/null +++ b/src/test/scripts/functions/unary/scalar/DFTest_BINOMIAL.dml @@ -0,0 +1,45 @@ +#------------------------------------------------------------- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +#------------------------------------------------------------- + +# test pbinomial and cdf to be same with lower tail TRUE +p = pbinomial (target=$2, trials=20, p=0.25) +p1 = pbinomial (target=$2, trials=20, p=0.25, lower.tail=TRUE) +pc = cdf (target=$2, trials=20, p=0.25, dist="binomial") +pc1 = cdf (target=$2, trials=20, p=0.25, dist="binomial", lower.tail=TRUE) + +if ((p != p1) | (p != pc) | (p != pc1)) { p = NaN } + +# test pbinomial and cdf be same with low tail FALSE +pl = pbinomial (target=$2, trials=20, p=0.25, lower.tail=FALSE) +pcl = cdf (target=$2, trials=20, p=0.25, lower.tail=FALSE, dist="binomial") + +if (pl != pcl) { pl = NaN } + +# test qbinomial and icdf to be same +q = qbinomial (target=$1, trials=20, p=0.25) +qc = icdf (target=$1, trials=20, p=0.25, dist="binomial") + +if (q != qc) { q = NaN } + +# produce pbinomial with lower TRUE/FALSE and qbinomial as output +res = rbind(as.matrix(p), as.matrix(pl), as.matrix(q)) + +write(res, $3) \ No newline at end of file
