[
https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011910#comment-13011910
]
Dmitriy Lyubimov commented on MAHOUT-633:
-----------------------------------------
Here is how hbase is fighting this:
https://issues.apache.org/jira/browse/HBASE-3455. Not quite the same as batch
iteration but basically, the same thing i was talking about : cloning stuff
into one "old" reference and reusing it is more "benign" than tons of
short-lived references. Not clear if they got any significant performance
boost, but they clearly experience dramatically less full GCs in 0.90.1. I
think performance would be more affected more in CPU-bound batch though than in
an hbase type serivce.
> Add SequenceFileIterable; put Iterable stuff in one place
> ---------------------------------------------------------
>
> Key: MAHOUT-633
> URL: https://issues.apache.org/jira/browse/MAHOUT-633
> Project: Mahout
> Issue Type: Improvement
> Components: Classification, Clustering, Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Labels: iterable, iterator, sequence-file
> Fix For: 0.5
>
> Attachments: MAHOUT-633.patch, MAHOUT-633.patch
>
>
> In another project I have a useful little class, SequenceFileIterable, which
> simplifies iterating over a sequence file. It's like FileLineIterable. I'd
> like to add it, then use it throughout the code. See patch, which for now
> merely has the proposed new classes.
> Well it also moves some other iterator-related classes that seemed to be
> outside their rightful home in common.iterator.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira