[
https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012634#comment-13012634
]
Dmitriy Lyubimov commented on MAHOUT-633:
-----------------------------------------
bq. And, in about half the cases, the caller is cloning the key and/or value
because it wants to save a copy. So in some cases it's already making new
objects.
Yes, that's true. We can't prevent ppl from screwing it over. We only can given
them a chance not to. And I want that chance for myself.
bq. I had in mind that this factor is probably dwarfed by I/O and the actual
deserialization... right? I had the impression these Hadoop jobs were most
certainly I/O bound, not memory/GC/CPU bound.
Not in SSVD, it packs parts of massive scale QR and and stochastic projection
in one map step and i had it 98.8% avg CPU saturation. Which basically told me
i wasn't wasteful on I/O -- which I tried pretty hard not to be. QR algorithms
are quadratic -- even that we reduce the scale of the problem. I am still a
little bit wasteful on flops here but it's not dramatic and it got to be enough
for open source. So this near-limit memory use GC stuff will affect me very
very much (i build a series of jobs with similar dynamics before in java 1.5,
it was pretty bad (up to 50 times slower) until i employed the strategies i
told about above).
> Add SequenceFileIterable; put Iterable stuff in one place
> ---------------------------------------------------------
>
> Key: MAHOUT-633
> URL: https://issues.apache.org/jira/browse/MAHOUT-633
> Project: Mahout
> Issue Type: Improvement
> Components: Classification, Clustering, Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Labels: iterable, iterator, sequence-file
> Fix For: 0.5
>
> Attachments: MAHOUT-633.patch, MAHOUT-633.patch, MAHOUT-633.patch
>
>
> In another project I have a useful little class, SequenceFileIterable, which
> simplifies iterating over a sequence file. It's like FileLineIterable. I'd
> like to add it, then use it throughout the code. See patch, which for now
> merely has the proposed new classes.
> Well it also moves some other iterator-related classes that seemed to be
> outside their rightful home in common.iterator.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira