[jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes

Owen O'Malley (JIRA) Wed, 30 Jul 2008 08:50:22 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618396#action_12618396
 ]


Owen O'Malley commented on HADOOP-1230:
---------------------------------------

{quote}
Q1: Why the string version of the Context.getCounter() method needs an int id ?
{quote}

That was a slip up. The old interface looked like that. I'll fix it.

{quote}
Q2: Would I be able to have a subclass of the Context that supports multiple 
outputs (ie via the MultipleOutputs) class?
{quote}

One advantage of making Mapper a base class instead of an interface is that I'd 
suggest something like:

{code}
class MultipleOutputMapper extends Mapper {
  private stuff including outer context
  <K,V> void collect(String dest, K key, V value) throws IOException { ... }
}
{code}

the the user's mapper can extend MultipleOutputMapper and get the additional 
collect method. Does that make sense? It would also be possible to have the 
MultipleOutputMapper make a wrapper Context that included the additional 
method, but the map method would need to downcast, which seems less 
user-friendly.

{quote}
C1: Have you considered instead having a single Context having an InContext and 
an OutContext where the IN contains incoming stuff (key, values, splits, 
jobconf, etc.) and the OUT is used for the output stuff (collect).
{quote}
Fundamentally, the map and reduce input are the same and are handled by the 
TaskAttemptContext. The ReduceContext just provides the utility functions 
getValues() to iterate through the values for the current key. I think it would 
be more confusing to have input, state, and output contexts.

> Replace parameters with context objects in Mapper, Reducer, Partitioner, 
> InputFormat, and OutputFormat classes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1230
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1230
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: context-objs-2.patch, context-objs-3.patch, 
> context-objs.patch
>
>
> This is a big change, but it will future-proof our API's. To maintain 
> backwards compatibility, I'd suggest that we move over to a new package name 
> (org.apache.hadoop.mapreduce) and deprecate the old interfaces and package. 
> Basically, it will replace:
> package org.apache.hadoop.mapred;
> public interface Mapper extends JobConfigurable, Closeable {
>   void map(WritableComparable key, Writable value, OutputCollector output, 
> Reporter reporter) throws IOException;
> }
> with:
> package org.apache.hadoop.mapreduce;
> public interface Mapper extends Closable {
>   void map(MapContext context) throws IOException;
> }
> where MapContext has the methods like getKey(), getValue(), collect(Key, 
> Value), progress(), etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1230) Replace parameters with context objects in Mapper, Reducer, Partitioner, InputFormat, and OutputFormat classes

Reply via email to