[ 
https://issues.apache.org/jira/browse/MAHOUT-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011873#comment-13011873
 ] 

Ted Dunning commented on MAHOUT-633:
------------------------------------

{quote}
The one possible downside is that this implementation creates a new Writable 
for each read. This is mildly positive in that it avoids some common bugs in 
reading from sequence files wherein the caller doesn't realize it's storing a 
copy to a Writable that's changing. (The Mahout code is cloning/instantiating 
new ones in most cases anyways as it has to) It does mean more objects 
allocated though. While I think the overhead of that is minor, probably, 
compared to the I/O of the read itself, it's not obviously trivial.
{quote}

I think that creating a new Writable each time is probably a good thing on the 
whole.  I doubt seriously that it will be any slower as long as it avoids 
unnecessary copying of a large structure.  If you avoid large copies then new 
allocation can actually be better than re-use since new allocation keeps the 
garbage reclamation work in the newspace collector which is considerably better 
than letting stuff get copied several times and possibly even tenured.

> Add SequenceFileIterable; put Iterable stuff in one place
> ---------------------------------------------------------
>
>                 Key: MAHOUT-633
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-633
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Classification, Clustering, Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: iterable, iterator, sequence-file
>             Fix For: 0.5
>
>         Attachments: MAHOUT-633.patch, MAHOUT-633.patch
>
>
> In another project I have a useful little class, SequenceFileIterable, which 
> simplifies iterating over a sequence file. It's like FileLineIterable. I'd 
> like to add it, then use it throughout the code. See patch, which for now 
> merely has the proposed new classes. 
> Well it also moves some other iterator-related classes that seemed to be 
> outside their rightful home in common.iterator.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to