[
https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011873#comment-13011873
]
Ted Dunning commented on MAHOUT-633:
------------------------------------
{quote}
The one possible downside is that this implementation creates a new Writable
for each read. This is mildly positive in that it avoids some common bugs in
reading from sequence files wherein the caller doesn't realize it's storing a
copy to a Writable that's changing. (The Mahout code is cloning/instantiating
new ones in most cases anyways as it has to) It does mean more objects
allocated though. While I think the overhead of that is minor, probably,
compared to the I/O of the read itself, it's not obviously trivial.
{quote}
I think that creating a new Writable each time is probably a good thing on the
whole. I doubt seriously that it will be any slower as long as it avoids
unnecessary copying of a large structure. If you avoid large copies then new
allocation can actually be better than re-use since new allocation keeps the
garbage reclamation work in the newspace collector which is considerably better
than letting stuff get copied several times and possibly even tenured.
> Add SequenceFileIterable; put Iterable stuff in one place
> ---------------------------------------------------------
>
> Key: MAHOUT-633
> URL: https://issues.apache.org/jira/browse/MAHOUT-633
> Project: Mahout
> Issue Type: Improvement
> Components: Classification, Clustering, Collaborative Filtering
> Affects Versions: 0.4
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Labels: iterable, iterator, sequence-file
> Fix For: 0.5
>
> Attachments: MAHOUT-633.patch, MAHOUT-633.patch
>
>
> In another project I have a useful little class, SequenceFileIterable, which
> simplifies iterating over a sequence file. It's like FileLineIterable. I'd
> like to add it, then use it throughout the code. See patch, which for now
> merely has the proposed new classes.
> Well it also moves some other iterator-related classes that seemed to be
> outside their rightful home in common.iterator.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira