Thank you Dylan. The paper seems interesting. The abstract reads as follows : "We consider the problem of doing fast and reliable estimation of the number z of non-zero entries in a sparse boolean matrix product"... I think they maybe doing this for boolean matrices. The code that I pointed to in SystemML is used for all matrices. I am not sure if and how their technique translates to non-boolean matrices.
Also, I am asking for an explanation of what is implemented as opposed to what can be. I was wondering if the neat looking formula has been gotten from some literature somewhere? -Nakul On Tue, Jun 20, 2017 at 1:42 PM, Dylan Hutchison <[email protected] > wrote: > In case it is helpful, this 2010 paper > <https://www.semanticscholar.org/paper/Better-Size- > Estimation-for-Sparse-Matrix-Products-Amossen-Campagna/ > 52f70406bb4d887e1b79af81a746184c5723848a> > is my favorite for estimating the nnz of a matrix product with high > confidence. The algorithm is much more involved than the simple SystemML > calculation, because it looks at samples from the actual data. > > Here are some notes I have on the paper: > > There is a sketch algorithm [2] that gives a (1 + ε) approximation z̃ to z > for any ε > 4 / (z^.25) in O(m) > time and with a bound on the number of I/Os. In relational algebra, this is > estimating > |π_{ik}( A(i, j) ⋈ B(j, k) )|. The idea behind the algorithm is finding, > for some tuning parameter > k that varies with z, the k-smallest value of a hash function h(i; k). The > larger this value is, the more > is and ks that match for a given j. Repeat this for all js. Different (i, > k)s contribute to different entries > in the result matrix and have different hash values. A sketch for this > algorithm can be incrementally > maintained via independent samples on is and ks. Computing the z̃ estimate > from the sample takes o(n) > time for large enough z. The larger z is, the smaller a sample size we > need. (Need large samples for > small z.) (Everything above assumes the zero product property and > zero-sum-free). > > > > On Tue, Jun 20, 2017 at 12:06 PM, Nakul Jindal <[email protected]> wrote: > > > Hi, > > > > There is code in systemml to “estimate” the output sparsity of a matrix > > multiplication operation between two sparse matrices. > > Is there a citation for this? If not, have we done any tests to figure > out > > how good these estimates are? > > https://github.com/nakul02/systemml/blob/df8d4a63d8d09cae94b > > 6ca2634e31da554302c72/src/main/java/org/apache/sysml/ > > hops/OptimizerUtils.java#L944-L953 > > > > Thanks, > > Nakul > > >
