[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269420#comment-13269420
 ] 

Harsh J commented on MAPREDUCE-2001:
------------------------------------

bq. The users would then in their mapper or reducer configure or setup method 
call SequenceFileOutputFormat.setMetadata with the appropiate metadata object 
that they would create.

The problem exposed by this approach hits upon a possible inconsistency/bug in 
the framework:

| Record Writer Instantiation | Old API | New API |
| Map Task | Before Mapper | After Mapper |
| Reduce Task | After Reducer | After Reducer |

See MapTask.java/ReduceTask.java in 1.x for instance, methods 
run{Old/New}{Mapper/Reducer}. This has been so now for a very long time, and I 
do think changing this may break behavior of several users out there, including 
some of the code I've written at my former workplace. Though yeah, its highly 
strange no spec doc exists for this, we ought to have one via another JIRA.

Hence the mapper.configure() approach with a static method would unfortunately 
fail on the old API runs, for map-only jobs.

bq. Then we make the SequenceFileOutputFormat JobConfigurable so that 
ReflectionUtils.newInstance will call configure on it and load the metadata.

I imagine this working in a much better way. For new API users, they may still 
be able to sneak in changes per map/reduce task, and otherwise (on Old API) 
rely on driver to provide these up.

bq. I think we should avoid users having to subclass SequenceFileOutputFormat. 
Thoughts?

Agreed, given your new approach via jobconf. Lets also make sure we serialize 
with base64 encoding or so, to allow for special chars in metadata if users so 
wish it (cause job.xml dislikes special chars).
                
> Enhancement to SequenceFileOutputFormat to allow user to set MetaData
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2001
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2001
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2
>            Reporter: David Rosenstrauch
>            Priority: Minor
>         Attachments: MAPREDUCE-2001.patch
>
>
> The org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat class 
> currently does not provide a way for the user to pass in a MetaData object to 
> be written to the SequenceFile.
> Currently he only way for a developer to implement this functionality appears 
> to be to create a subclass which overrides the SequenceFileOutputFormat's 
> getRecordWriter() method, which is a bit of a kludge.
> This seems to be a common enough request to warrant a fix of some sort.  
> (It's already been brought up twice in the past year:  
> http://www.mail-archive.com/common-user@hadoop.apache.org/msg02198.html and 
> http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg00904.html)
> A couple of possible solutions:
> 1) provide a static method SequenceFileOutputFormat.setMetaData(Job, MetaData)
> 2) Provide a (non-static) setMetaData() method on the 
> SequenceFileOutputFormat class.  The user would create a subclass of 
> SequenceFileOutputFormat which, say, implements Configurable.  Then in the 
> setConf() method, the user could create the MetaData object (using data from 
> the Configuration), and then call setMetaData.  The SequenceFileOutputFormat 
> would then use this MetaData object when creating the SequenceFile.  (Note 
> that the user would have to create a subclass of SequenceFileOutputFormat to 
> make this solution work.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to