[
https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573958#action_12573958
]
Doug Cutting commented on HADOOP-1230:
--------------------------------------
The strategy here is to remove stuff that is likely to evolve from interfaces
that users implement, so that we can change it without breaking user
implementations.
Note that, in general, interfaces are much more difficult to evolve than
abstract base classes, and should thus only be used when "mixin" behavior is
required, e.g., when one needs to be able to add an interface to an existing
class. For example, interfaces like Serializeable and Writable are great,
since folks might need to have a class that implements both, which wouldn't be
possible if they were abstract base classes.
Some folks seem to believe that interfaces are somehow 'cleaner' and should be
used for all public facing APIs. But that's not true: interfaces are limited
to a subset of what abstract base classes can do, and an abstract class's
ability to provide default implementations of methods greatly facilitates API
evolution. Again (at the risk of sounding repetitive), the *only* advantage of
interfaces is that a class can implement more than one of them.
So, as we revisit our core APIs that users implement for the purpose of making
them more easily backward compatible, we should use abstract base classes in
place of interfaces whenever it is clear that multi-inheritance is not a
requirement.
Some cases to consider:
- Context, MapContext, & ReduceContext: we do expect these to change
frequently, so an abstract base class might simplify things. But, on the other
hand, we don't expect users to implement these much, so it may not be critical.
But users still might implement a framework that, e.g., wraps a context in
order to intercept its methods, and any changes to the context interface would
break such code. So, are there compelling cases where we need to mixin these
interfaces, or might they better be abstract classes? If not, these should be
abstract base classes.
- Mapper, Reducer, Partitioner: we hope, after adding contexts, to never need
to change these interfaces again, that all changes can be done through the
context. But still, if we did ever want to change them without breaking user
code, it would be possible if they're abstract base classes and would not be if
they're interfaces. So, again, are there compelling cases where these need to
be mixins? I know I've written code that uses them as mixins, but that's
mostly just because I was lazy: it was easier to add 'implements Mapper' and a
map() implementation than to add a new class. I can't think of a case where it
is required. Can you?
Other more minor questions:
- Mapper & Reducer are no longer JobConfigurable, was that intentional?
- Another benefit of contexts is that we can avoid serialization. Should we
add raw binary i/o from the outset, or add that in a subsequent pass?
> Replace parameters with context objects in Mapper, Reducer, Partitioner,
> InputFormat, and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1230
> URL: https://issues.apache.org/jira/browse/HADOOP-1230
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Attachments: context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain
> backwards compatibility, I'd suggest that we move over to a new package name
> (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package.
> Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
> void map(WritableComparable key, Writable value, OutputCollector output,
> Reporter reporter) throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
> void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key,
> Value), progress(), etc.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.