[ 
https://issues.apache.org/jira/browse/MAHOUT-653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017093#comment-13017093
 ] 

Ted Dunning commented on MAHOUT-653:
------------------------------------

Are these really enough faster to matter?  Do we have any code that is CPU 
bound on distance computations
rather than I/O bound?  I expect (but do not know) not.

Also, the fractional dimension L measures are pretty controversial.  L_1 does a 
fantastic job of
sparsification without sacrificing convergence properties.  I know that both 
Google and Yahoo use
L_1 regularization for on-line SGD based learners.

The IBM paper provided has some serious issues (at first glance).  In 
particular, they use a theoretical
figure of merit that is based on contrast for a particular distribution of 
points.  Unfortunately, the
choice of distribution that they used almost always involves distributions that 
are substantially not
invariant under rotation.  Since they are proving bounds over all possible 
distributions, this choice
of an almost always pathological case gives bounds that are dominated by the 
pathological case.  Most of
the time, these distributions will have points that are highly oriented on the 
axes, which may or may not be 
realistic.  Data that has this property is trivially advantageous to metrics 
L_k for fractional values of k.

Given this issue with the basic premises and the well-known issues of 
convergence, I would recommend that
folks be pretty cautious with these results.



> Approximations to standard functions
> ------------------------------------
>
>                 Key: MAHOUT-653
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-653
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Lance Norskog
>         Attachments: MAHOUT-653.patch, MAHOUT-653.patch
>
>
> These give approximate versions of pow(value, exponent), exp(value), and 
> natural log(value).
> log() and exp() stolen from:
> [http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/]
> pow() stolen from:
> [http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to