[
https://issues.apache.org/jira/browse/MAHOUT-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suneel Marthi updated MAHOUT-1232:
----------------------------------
Status: Patch Available (was: Open)
When the number of nonZero elements in the input is < maxEntries, the call to
queue.pop returns a Pair<null, null>. The subsequent call to pair.getFirst()
results in a NullPointerException.
> VectorHelper.topEntries() throws a NPE when number of NonZero elements in
> vector < maxEntries
> ---------------------------------------------------------------------------------------------
>
> Key: MAHOUT-1232
> URL: https://issues.apache.org/jira/browse/MAHOUT-1232
> Project: Mahout
> Issue Type: Bug
> Components: Integration
> Affects Versions: 0.7, 0.8
> Reporter: Suneel Marthi
> Assignee: Suneel Marthi
> Fix For: 0.8
>
> Attachments: MAHOUT-1232.patch
>
>
> Vectordump throws a NullPointerException when sort is specified and the
> number of NonZero elements in the input vector is less than the specified
> vector size (-vs).
> {Code}
> mahout vectordump -i reuters-vectors/tfidf-vectors -dt sequencefile -d
> reuters-vectors/dictionary.file-* -vs 15 -ni 30 -o vectordump -p true -sort
> reuters-vectors/tfidf-vectors
> INFO: Sort? true
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.mahout.utils.vectors.VectorHelper.topEntries(VectorHelper.java:89)
> at
> org.apache.mahout.utils.vectors.VectorHelper.vectorToJson(VectorHelper.java:135)
> at
> org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:242)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:262)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> {Code}
> The issue is in the following block of code that is invoked when sort=true in
> VectorHelper.java
> {Code}
> for (Element e : vector.nonZeroes()) {
> queue.insertWithOverflow(Pair.of(e.index(), e.get()));
> }
> List<Pair<Integer, Double>> entries = Lists.newArrayList();
> Pair<Integer, Double> pair;
> while ((pair = queue.pop()) != null) {
> if (pair.getFirst() > -1) {
> entries.add(pair);
> }
> }
> {Code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira