[
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665109#comment-13665109
]
Dawid Weiss edited comment on MAHOUT-1225 at 5/23/13 12:31 PM:
---------------------------------------------------------------
There are a bunch of issues, actually. In no particular order randomized
testing against JUC allowed me to spot the following:
1. Inconsistent clearing of buffers.
{code}
- Arrays.fill(this.state, 0, state.length - 1, FREE);
+ Arrays.fill(this.state, FREE);
{code}
this eventually leads to endless loops because invariants are no longer
respected.
2. Unreleased references.
{code}
- Arrays.fill(this.state, 0, state.length - 1, FREE);
+ Arrays.fill(this.state, FREE);
+ Arrays.fill(this.table, FREE);
{code}
Some of the generic collections do not clear buffers properly, never allowing
the objects stored inside those buffers to be GCed (unless the following
trimToSize() does it, but I don't think it always does).
3. Inconsistent reference/equals comparisons.
{code}
- while (stat[i] != FREE && (stat[i] == REMOVED || tab[i] != key)) {
+ while (stat[i] != FREE && (stat[i] == REMOVED || !key.equals(tab[i]))) {
{code}
Note that the current code always assumes non-null keys which is not mentioned
anywhere. I assumed non-null keys.
4. Inconsistent return value from add.
{code}
int newCapacity = chooseGrowCapacity(this.distinct + 1,
this.minLoadFactor, this.maxLoadFactor);
rehash(newCapacity);
- return add(key);
+ // we've already added the element so we don't need to add it again.
+ return true;
{code}
This is odd, don't even know why this block is there. But the result is
inconsistent because the rehashing takes place after the key has been added so
a subsequent add() returns an invalid value.
I did *not* go through the code thoroughly, this is a result of a quick test
against JUC. I bet there are other issues in there that may be lurking in the
(untested) shadows. While I appreciate the effort put into this code I still
think it'd be wiser for Mahout to move on to fastutil, trove or something
similar that has a better test coverage.
I've tried to commit this in but my commit rights have been revoked apparently
(perhaps this happens when you're on an emeritus status :).
was (Author: dweiss):
There are a bunch of issues, actually. In no particular order randomized
testing against JUC allowed me to spot the following:
{code}
inconsistent c
> Sets and maps incorrectly clear() their state arrays (potential endless loops)
> ------------------------------------------------------------------------------
>
> Key: MAHOUT-1225
> URL: https://issues.apache.org/jira/browse/MAHOUT-1225
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Affects Versions: 0.7
> Environment: Eclipse, linux Fedora 17, Java 1.7, Mahout Maths
> collections (Set) 0.7, hppc 0.4.3
> Reporter: Sophie Sperner
> Assignee: Dawid Weiss
> Labels: hashset, java, mahout, test
> Fix For: 0.7
>
> Attachments: hppc-0.4.3.jar, MAHOUT-1225.patch, MAHOUT-1225.patch,
> mushroom.dat
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> The code I attached hangs on forever, Eclipse does not print me its stack
> trace because it does not terminate the program. So I decided to make a small
> test.java file that you can easily run.
> This code has the main function that simply runs getItemList() method which
> successfully executes getDataset() method (here please download mushroom.dat
> dataset and set the full path into filePath string variable) and the hangs on
> (the problem happens on a fourth columnValues.add() call). After the dataset
> was taken into X array, the code simply goes through X column by column and
> searches for different items in it.
> If you uncomment IntSet columnValues = new IntOpenHashSet(); and
> corresponding import headers then everything will work just fine (you will
> also need to include hppc jar file found here
> http://labs.carrotsearch.com/hppc.html or below in the attachment).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira