[ 
https://issues.apache.org/jira/browse/MAHOUT-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030967#comment-14030967
 ] 

ASF GitHub Bot commented on MAHOUT-1580:
----------------------------------------

Github user dlyubimov commented on the pull request:

    https://github.com/apache/mahout/pull/17#issuecomment-46044957
  
    I guess i am a bit conflicted. 
    
    On one hand, if abstraction contract says one thing, concrete 
implementation definitely must not do something else -- even if it means 
inefficiency. (If approximation is desired, the contract must be very explicit 
about it: 
    
        /**
         * Return the number of non zero elements in the vector.
         *
         * @return an int
         */
        int getNumNonZeroElements();
    
    Actually i've been using this for some time and have always been this is 
what it does -- reports the non-zero elements. In that sense, i am 
wholeheartedly support this patch since i firmly believe it is a better 
situation to be in compared to current state of things. (nothing can trump 
declared abstract contract -- if there's faster less than accurate alternative, 
just create another perhaps optional contract that explicitly says so). but you 
can't have a contract that says "number of non-zero elements" and return say 50 
where true value of non-zero elements is 10. 
    
    
    On the other hand, it is very painfully enticing to implement that as an 
internal counter rather than a computation -- by controlling all assigment 
flows. Which might be origanizationally difficult. The analogy here is that say 
hash sets always know their cardinality without having to compute it.
    
    So, i'd say, if count tracking is difficult (or viewed as more expensive 
option since it obviously adds load to modifications), then i'd say commit this.



> Optimize getNumNonZeroElements
> ------------------------------
>
>                 Key: MAHOUT-1580
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1580
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>
> getNumNonZeroElements in AbstractVector uses the nonZeroes -iterator 
> internally which adds a lot of overhead for certain types of vectors, e.g. 
> the dense ones. We should add custom implementations here.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to