[ 
https://issues.apache.org/jira/browse/MAHOUT-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740070#action_12740070
 ] 

Sean Owen commented on MAHOUT-158:
----------------------------------

For the interested, was able to drive the memory requirements down to more like 
the expected value -- now running comfortably in a heap of 360M compared to 
needing 600M before (and more like 1GB before MAHOUT-151/154).

It was an interesting lesson in GC ergonomics. I found myself running into 
incredible GC overhead before the heap was full -- not even close. I learned 
the difference between the young generation and tenured generation in the GC: 
the default way memory is organized, it will let "old" objects consume only 
about 75% of the heap. Now that this system is more lean, almost all objects in 
memory are long-lived, and a lot less garbage is generated since long 
primitives are used instead of Longs and there is much less conversion between 
the two. So I had to set -XX:NewRatio=9 to ask it to allow more like 90% for 
'old' objects. Then I was able to bring down the heap size to a more reasonble 
value.

I am proceeding to convert the tests now, as I review the changes. This is 
another big one.

> Replace all ID values with long
> -------------------------------
>
>                 Key: MAHOUT-158
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-158
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.2
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>             Fix For: 0.2
>
>         Attachments: MAHOUT-158.patch
>
>
> As mentioned on mailing list, I am tracking this as a possible change for 
> evaluation. The idea is to save more memory / CPU by avoiding the Object 
> overhead of tens of millions of ID objects by using long IDs instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to