[
https://issues.apache.org/jira/browse/MAHOUT-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740070#action_12740070
]
Sean Owen commented on MAHOUT-158:
----------------------------------
For the interested, was able to drive the memory requirements down to more like
the expected value -- now running comfortably in a heap of 360M compared to
needing 600M before (and more like 1GB before MAHOUT-151/154).
It was an interesting lesson in GC ergonomics. I found myself running into
incredible GC overhead before the heap was full -- not even close. I learned
the difference between the young generation and tenured generation in the GC:
the default way memory is organized, it will let "old" objects consume only
about 75% of the heap. Now that this system is more lean, almost all objects in
memory are long-lived, and a lot less garbage is generated since long
primitives are used instead of Longs and there is much less conversion between
the two. So I had to set -XX:NewRatio=9 to ask it to allow more like 90% for
'old' objects. Then I was able to bring down the heap size to a more reasonble
value.
I am proceeding to convert the tests now, as I review the changes. This is
another big one.
> Replace all ID values with long
> -------------------------------
>
> Key: MAHOUT-158
> URL: https://issues.apache.org/jira/browse/MAHOUT-158
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.2
> Reporter: Sean Owen
> Assignee: Sean Owen
> Fix For: 0.2
>
> Attachments: MAHOUT-158.patch
>
>
> As mentioned on mailing list, I am tracking this as a possible change for
> evaluation. The idea is to save more memory / CPU by avoiding the Object
> overhead of tens of millions of ID objects by using long IDs instead.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.