[jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes

Doug Cutting (JIRA) Fri, 29 Feb 2008 13:01:56 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573958#action_12573958
 ]


Doug Cutting commented on HADOOP-1230:
--------------------------------------

The strategy here is to remove stuff that is likely to evolve from interfaces 
that users implement, so that we can change it without breaking user 
implementations.

Note that, in general, interfaces are much more difficult to evolve than 
abstract base classes, and should thus only be used when "mixin" behavior is 
required, e.g., when one needs to be able to add an interface to an existing 
class.  For example, interfaces like Serializeable and Writable are great, 
since folks might need to have a class that implements both, which wouldn't be 
possible if they were abstract base classes.

Some folks seem to believe that interfaces are somehow 'cleaner' and should be 
used for all public facing APIs.  But that's not true: interfaces are limited 
to a subset of what abstract base classes can do, and an abstract class's 
ability to provide default implementations of methods greatly facilitates API 
evolution.  Again (at the risk of sounding repetitive), the *only* advantage of 
interfaces is that a class can implement more than one of them.

So, as we revisit our core APIs that users implement for the purpose of making 
them more easily backward compatible, we should use abstract base classes in 
place of interfaces whenever it is clear that multi-inheritance is not a 
requirement.

Some cases to consider:
- Context, MapContext, & ReduceContext: we do expect these to change 
frequently, so an abstract base class might simplify things.  But, on the other 
hand, we don't expect users to implement these much, so it may not be critical. 
 But users still might implement a framework that, e.g., wraps a context in 
order to intercept its methods, and any changes to the context interface would 
break such code.  So, are there compelling cases where we need to mixin these 
interfaces, or might they better be abstract classes?  If not, these should be 
abstract base classes.
- Mapper, Reducer, Partitioner: we hope, after adding contexts, to never need 
to change these interfaces again, that all changes can be done through the 
context.  But still, if we did ever want to change them without breaking user 
code, it would be possible if they're abstract base classes and would not be if 
they're interfaces.  So, again, are there compelling cases where these need to 
be mixins?  I know I've written code that uses them as mixins, but that's 
mostly just because I was lazy: it was easier to add  'implements Mapper' and a 
map() implementation than to add a new class.  I can't think of a case where it 
is required.  Can you?

Other more minor questions:
- Mapper & Reducer are no longer JobConfigurable, was that intentional?
- Another benefit of contexts is that we can avoid serialization.  Should we 
add raw binary i/o from the outset, or add that in a subsequent pass?


> Replace parameters with context objects in Mapper, Reducer, Partitioner, 
> InputFormat, and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1230
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain 
> backwards compatibility, I'd suggest that we move over to a new package name 
> (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package. 
> Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
>   void map(WritableComparable key, Writable value, OutputCollector output, 
> Reporter reporter) throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
>   void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key, 
> Value), progress(), etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes

Reply via email to