[
https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665684#comment-13665684
]
Jake Mannix commented on MAHOUT-1227:
-------------------------------------
Another case I'm not sure about is in your code, Ted:
ModelDissector, line 174: do you really want to look for the largest values
including zeros:
{code}
public Weight(String feature, Vector weights, int n) {
this.feature = feature;
// pick out the weight with the largest abs value, but don't forget the
sign
Queue<Category> biggest = new PriorityQueue<Category>(n + 1,
Ordering.natural());
for (Vector.Element element : weights) {
biggest.add(new Category(element.index(), element.get()));
while (biggest.size() > n) {
biggest.poll();
}
}
categories = Lists.newArrayList(biggest);
Collections.sort(categories, Ordering.natural().reverse());
value = categories.get(0).weight;
maxIndex = categories.get(0).index;
}
{code}
> Vector.iterateNonZero() is super-clumsy to use: add Iterable<Element>
> allNonZero()
> ----------------------------------------------------------------------------------
>
> Key: MAHOUT-1227
> URL: https://issues.apache.org/jira/browse/MAHOUT-1227
> Project: Mahout
> Issue Type: Improvement
> Environment: all
> Reporter: Andy Schlaikjer
> Assignee: Jake Mannix
> Fix For: 0.8
>
> Attachments: MAHOUT-1227.diff
>
>
> Currently, our codebase is littered with the following:
> {code}
> Iterator<Element> it = vector.iterateNonZero();
> while (it.hasNext()) {
> Element e = it.next();
> ...
> {code}
> wouldn't it be nice to be able to do:
> {code}
> for (Element e : vector.allNonZero()) {
> ...
> {code}
> instead?
> I propose adding an Iterable<Element> allNonZero() which allow this syntactic
> sugar. To make it symmetric with iterateAll, let's also add
> Iterable<Element> all(), and implement the simply in AbstractVector.
> The first diff adding this is very non-invasive - new methods added to
> interface, implemented in the three classes from which all Vector
> implementations derive (AbstractVector, NamedVector, and DelegatingVector).
> User code should just work, unless they've implemented their own vector
> without subclassing one of these three (yikes).
> Next diff, which is more invasive, would remove "extends Iterable<Element>"
> from Vector, because using the foreach of a Vector itself is very rarely what
> the caller really means to do (it's the all-iterator, very bad for the more
> common sparse use case). To achieve the same effect, the caller chooses
> between vector.all() and vector.allNonZero(), and then they're being crystal
> clear what they mean.
> Lastly, I'd propose we make iterateAll() and iterateAllNonZero() protected
> methods on AbstractVector, so that we are forced to remove all the clumsy
> places where we do Iterator<Element> it = ... all throughout the codebase. I
> suspect there will be very few places left that really want the raw iterator,
> but if there are any, it can be gotten by calling
> vector.(all/allNonZero).iterator()
> (feature-request/api fix suggestion idea courtesy of Andy Schlaikjer,
> formalized as a proposal and posted up here by me, Jake Mannix)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira