[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269698#comment-13269698
 ] 

Harsh J commented on MAPREDUCE-2001:
------------------------------------

bq. Why would the configure method of the mapper care if the 
recordwriter/outputformat had been created yet?

It doesn't care on its own, but advanced users may be relying on this 
'behavior' to do stuff. For instance, I've once relied on this behavior to have 
my output format inject a few strings into jobconf upon RW instantiation (some 
logic dependent on input format initialization that goes even before this), 
such that I then get the set strings in my mapper's configure. True that I was 
probably doing something wrong, and what I did can be done in a 
better/alternate way, but I ended up relying on that behavior and thats what 
I'm talking about (in terms of breakage).

bq. I would think we would want the recordwriter/outputformat to get configured 
after the configure method to allow tasks to make task level config changes to 
a recordwriter/outputformat

True. I just don't know why its this way in the old API. Probably an oversight.

bq. I am confused by this comment, do you agree with my approach or are you 
just disappointed that the behavior will be inconsistent between the old and 
new api for map only jobs?

Sorry for the confusion. Its just the latter. I agree with your approach.
                
> Enhancement to SequenceFileOutputFormat to allow user to set MetaData
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2001
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2001
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: David Rosenstrauch
>            Priority: Minor
>         Attachments: MAPREDUCE-2001.patch
>
>
> The org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat class 
> currently does not provide a way for the user to pass in a MetaData object to 
> be written to the SequenceFile.
> Currently he only way for a developer to implement this functionality appears 
> to be to create a subclass which overrides the SequenceFileOutputFormat's 
> getRecordWriter() method, which is a bit of a kludge.
> This seems to be a common enough request to warrant a fix of some sort.  
> (It's already been brought up twice in the past year:  
> http://www.mail-archive.com/common-user@hadoop.apache.org/msg02198.html and 
> http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg00904.html)
> A couple of possible solutions:
> 1) provide a static method SequenceFileOutputFormat.setMetaData(Job, MetaData)
> 2) Provide a (non-static) setMetaData() method on the 
> SequenceFileOutputFormat class.  The user would create a subclass of 
> SequenceFileOutputFormat which, say, implements Configurable.  Then in the 
> setConf() method, the user could create the MetaData object (using data from 
> the Configuration), and then call setMetaData.  The SequenceFileOutputFormat 
> would then use this MetaData object when creating the SequenceFile.  (Note 
> that the user would have to create a subclass of SequenceFileOutputFormat to 
> make this solution work.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to