[ 
https://issues.apache.org/jira/browse/MAHOUT-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665109#comment-13665109
 ] 

Dawid Weiss edited comment on MAHOUT-1225 at 5/23/13 12:31 PM:
---------------------------------------------------------------

There are a bunch of issues, actually. In no particular order randomized 
testing against JUC allowed me to spot the following:

1. Inconsistent clearing of buffers.

{code}
-    Arrays.fill(this.state, 0, state.length - 1, FREE);
+    Arrays.fill(this.state, FREE); 
{code}

this eventually leads to endless loops because invariants are no longer 
respected. 

2. Unreleased references.

{code}
-    Arrays.fill(this.state, 0, state.length - 1, FREE);
+    Arrays.fill(this.state, FREE);
+    Arrays.fill(this.table, FREE); 
{code}

Some of the generic collections do not clear buffers properly, never allowing 
the objects stored inside those buffers to be GCed (unless the following 
trimToSize() does it, but I don't think it always does).

3. Inconsistent reference/equals comparisons.

{code}
-      while (stat[i] != FREE && (stat[i] == REMOVED || tab[i] != key)) {
+      while (stat[i] != FREE && (stat[i] == REMOVED || !key.equals(tab[i]))) { 
{code}

Note that the current code always assumes non-null keys which is not mentioned 
anywhere. I assumed non-null keys.

4. Inconsistent return value from add.

{code}
       int newCapacity = chooseGrowCapacity(this.distinct + 1, 
this.minLoadFactor, this.maxLoadFactor);
       rehash(newCapacity);
-      return add(key);
+      // we've already added the element so we don't need to add it again.
+      return true; 
{code}

This is odd, don't even know why this block is there. But the result is 
inconsistent because the rehashing takes place after the key has been added so 
a subsequent add() returns an invalid value.

I did *not* go through the code thoroughly, this is a result of a quick test 
against JUC. I bet there are other issues in there that may be lurking in the 
(untested) shadows. While I appreciate the effort put into this code I still 
think it'd be wiser for Mahout to move on to fastutil, trove or something 
similar that has a better test coverage.

I've tried to commit this in but my commit rights have been revoked apparently 
(perhaps this happens when you're on an emeritus status :).

                
      was (Author: dweiss):
    There are a bunch of issues, actually. In no particular order randomized 
testing against JUC allowed me to spot the following:

{code}
inconsistent c
                  
> Sets and maps incorrectly clear() their state arrays (potential endless loops)
> ------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1225
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1225
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.7
>         Environment: Eclipse, linux Fedora 17, Java 1.7, Mahout Maths 
> collections (Set) 0.7, hppc 0.4.3
>            Reporter: Sophie Sperner
>            Assignee: Dawid Weiss
>              Labels: hashset, java, mahout, test
>             Fix For: 0.7
>
>         Attachments: hppc-0.4.3.jar, MAHOUT-1225.patch, MAHOUT-1225.patch, 
> mushroom.dat
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> The code I attached hangs on forever, Eclipse does not print me its stack 
> trace because it does not terminate the program. So I decided to make a small 
> test.java file that you can easily run.
> This code has the main function that simply runs getItemList() method which 
> successfully executes getDataset() method (here please download mushroom.dat 
> dataset and set the full path into filePath string variable) and the hangs on 
> (the problem happens on a fourth columnValues.add() call). After the dataset 
> was taken into X array, the code simply goes through X column by column and 
> searches for different items in it.
> If you uncomment IntSet columnValues = new IntOpenHashSet(); and 
> corresponding import headers then everything will work just fine (you will 
> also need to include hppc jar file found here 
> http://labs.carrotsearch.com/hppc.html or below in the attachment).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to