[ 
https://issues.apache.org/jira/browse/HADOOP-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544184
 ] 

stack commented on HADOOP-2244:
-------------------------------

Pardon.  I should say more.  Owen, true. The 'instances' was quoted in the 
original description.  It was meant to be short-hand for: readFields does a 
sorta new instance - - the internal Writable representation is blasted and 
overwritten with new data -- but then they are not really new instances since 
the object is being reused....

> MapWritable.readFields needs to clear internal hash else instance accumulates 
> entries forever
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2244
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2244
>             Project: Hadoop
>          Issue Type: Bug
>          Components: io
>            Reporter: stack
>             Fix For: 0.16.0
>
>         Attachments: hadoop-2244.patch
>
>
> A common framework pattern is to get an instance of a Writable, usually by 
> reflection, and then just keep calling readFields to make new 'instances' of 
> the particular Writable.
> For example, the spill-to-disk that is run at the end of a map task gets 
> instances of map output keys and values and then loops over the (sorted) map 
> output calling readFields to make instances to write out to the filesystem 
> (See around line #470 in the spill method).
> If the particular Writable is an instance of MapWritable, currently we get 
> funny results.  It has an internal hash map that is created on instantiation. 
>  Each time the readFields method is called, the newly deserialized entries 
> are added to the internal map.  The map needs to be reset when readFields is 
> called so it doesn't just keep growing ad infinitum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to