[
https://issues.apache.org/jira/browse/MAHOUT-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030967#comment-14030967
]
ASF GitHub Bot commented on MAHOUT-1580:
----------------------------------------
Github user dlyubimov commented on the pull request:
https://github.com/apache/mahout/pull/17#issuecomment-46044957
I guess i am a bit conflicted.
On one hand, if abstraction contract says one thing, concrete
implementation definitely must not do something else -- even if it means
inefficiency. (If approximation is desired, the contract must be very explicit
about it:
/**
* Return the number of non zero elements in the vector.
*
* @return an int
*/
int getNumNonZeroElements();
Actually i've been using this for some time and have always been this is
what it does -- reports the non-zero elements. In that sense, i am
wholeheartedly support this patch since i firmly believe it is a better
situation to be in compared to current state of things. (nothing can trump
declared abstract contract -- if there's faster less than accurate alternative,
just create another perhaps optional contract that explicitly says so). but you
can't have a contract that says "number of non-zero elements" and return say 50
where true value of non-zero elements is 10.
On the other hand, it is very painfully enticing to implement that as an
internal counter rather than a computation -- by controlling all assigment
flows. Which might be origanizationally difficult. The analogy here is that say
hash sets always know their cardinality without having to compute it.
So, i'd say, if count tracking is difficult (or viewed as more expensive
option since it obviously adds load to modifications), then i'd say commit this.
> Optimize getNumNonZeroElements
> ------------------------------
>
> Key: MAHOUT-1580
> URL: https://issues.apache.org/jira/browse/MAHOUT-1580
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Reporter: Sebastian Schelter
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
>
> getNumNonZeroElements in AbstractVector uses the nonZeroes -iterator
> internally which adds a lot of overhead for certain types of vectors, e.g.
> the dense ones. We should add custom implementations here.
--
This message was sent by Atlassian JIRA
(v6.2#6252)