Hi, On Tue, Jun 26, 2012 at 7:29 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > Hi, > > On Tue, Jun 26, 2012 at 5:04 PM, Charles R Harris > <charlesr.har...@gmail.com> wrote: >> >> >> On Tue, Jun 26, 2012 at 5:46 PM, Matthew Brett <matthew.br...@gmail.com> >> wrote: >>> >>> Hi, >>> >>> On Tue, Jun 26, 2012 at 4:39 PM, Benjamin Root <ben.r...@ou.edu> wrote: >>> > >>> > >>> > On Tuesday, June 26, 2012, Charles R Harris wrote: >>> >> >>> >> >>> >> >>> >> On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett >>> >> <matthew.br...@gmail.com> >>> >> wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett >>> >> <matthew.br...@gmail.com> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris >>> >> > <charlesr.har...@gmail.com> wrote: >>> >> >> >>> >> >> >>> >> >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett >>> >> >> <matthew.br...@gmail.com> >>> >> >> wrote: >>> >> >>> >>> >> >>> Hi, >>> >> >>> >>> >> >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett >>> >> >>> <matthew.br...@gmail.com> >>> >> >>> wrote: >>> >> >>> > Hi, >>> >> >>> > >>> >> >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith <n...@pobox.com> >>> >> >>> > wrote: >>> >> >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >>> >> >>> >> <charlesr.har...@gmail.com> wrote: >>> >> >>> >>> >>> >> >>> >>> >>> >> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >>> One potential problem is that it implies that it will always be the >>> same as any version of matlab's tolerance. What if they change it in >>> a future release? How likely are we to even notice? >>> >>> >>> >> >>> >>> <matthew.br...@gmail.com> >>> >> >>> >>> wrote: >>> >> >>> >>>> >>> >> >>> >>>> Hi, >>> >> >>> >>>> >>> >> >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full >>> >> >>> >>>> rank >>> >> >>> >>>> for >>> >> >>> >>>> matrices that are numerically rank deficient: >>> >> >>> >>>> >>> >> >>> >>>> If I repeatedly make random matrices, then set the first >>> >> >>> >>>> column >>> >> >>> >>>> to be >>> >> >>> >>>> equal to the sum of the second and third columns: >>> >> >>> >>>> >>> >> >>> >>>> def make_deficient(): >>> >> >>> >>>> X = np.random.normal(size=(40, 10)) >>> >> >>> >>>> deficient_X = X.copy() >>> >> >>> >>>> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>> >> >>> >>>> return deficient_X >>> >> >>> >>>> >>> >> >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns >>> >> >>> >>>> full >>> >> >>> >>>> rank >>> >> >>> >>>> (10) in about 8 percent of cases (see appended script). >>> >> >>> >>>> >>> >> >>> >>>> I think this is a tolerance problem. The ``matrix_rank`` >>> >> >>> >>>> algorithm >>> >> >>> >>>> does this by default: >>> >> >>> >>>> >>> >> >>> >>>> S = spl.svd(M, compute_uv=False) >>> >> >>> >>>> tol = S.max() * np.finfo(S.dtype).eps >>> >> >>> >>>> return np.sum(S > tol) >>> >> >>> >>>> >>> >> >>> >>>> I guess we'd we want the lowest tolerance that nearly always >>> >> >>> >>>> or >>> >> >>> >>>> always >>> >> >>> >>>> identifies numerically rank deficient matrices. I suppose one >>> >> >>> >>>> way of >>> >> >>> >>>> looking at whether the tolerance is in the right range is to >>> >> >>> >>>> compare >>> >> >>> >>>> the calculated tolerance (``tol``) to the minimum singular >>> >> >>> >>>> value >>> >> >>> >>>> (``S.min()``) because S.min() in our case should be very small >>> >> >>> >>>> and >>> >> >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() >>> >> >>> >>>> for >>> >> >>> >>>> the >>> >> >>> >>>> current algorithm, across many iterations, is about 2.8. We >>> >> >>> >>>> might >>> >> >>> >>>> hope this value would be higher than 1, but not much higher, >>> >> >>> >>>> otherwise >>> >> >>> >>>> we might be rejecting too many columns. >>> >> >>> >>>> >>> >> >>> >>>> Our current algorithm for tolerance is the same as the 2-norm >>> >> >>> >>>> of >>> >> >>> >>>> M * >>> >> >>> >>>> eps. We're citing Golub and Van Loan for this, but now I look >>> >> >>> >>>> at >>> >> >>> >>>> our >>> >> >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * >>> >> >>> >>>> |M| >>> >> >>> >>>> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think >>> >> >>> >>>> the >>> >> >>> >>>> Golub >>> >> >>> >> >>> >> I'm fine with that, and agree that it is likely to lead to fewer folks >>> >> wondering why Matlab and numpy are different. A good explanation in the >>> >> function documentation would be useful. >>> >> >>> >> Chuck >>> >> >>> > >>> > One potential problem is that it implies that it will always be the same >>> > as >>> > any version of matlab's tolerance. What if they change it in a future >>> > release? How likely are we to even notice? >>> >>> I guess that matlab is unlikely to change for the same reason that we >>> would be reluctant to change, once we've found an acceptable value. >>> >>> I was thinking that we would say something like: >>> >>> """ >>> The default tolerance is : >>> >>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>> >>> This corresponds to the tolerance suggested in NR page X, and to the >>> tolerance used by MATLAB at the time of writing (June 2012; see >>> http://www.mathworks.com/help/techdoc/ref/rank.html). >>> """ >>> >>> I don't know whether we would want to track changes made by matlab - >>> maybe we could have that discussion if they do change? >> >> >> I wouldn't bother tracking Matlab, but I think the alternative threshold >> could be mentioned in the notes. Something like >> >> A less conservative threshold is ... >> >> Maybe mention that because of numerical uncertainty there will always be a >> chance that the computed rank could be wrong, but that with the conservative >> threshold the rank is very unlikely to be less than the computed rank. > > Sounds good to me. Would anyone object to a pull request with these > changes (matlab tolerance default, description in docstring)?
Pull request here: https://github.com/numpy/numpy/pull/357 Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion