dynamically loading C++  mapper/reducer classes in map/reduce jobs
------------------------------------------------------------------

                 Key: HADOOP-913
                 URL: https://issues.apache.org/jira/browse/HADOOP-913
             Project: Hadoop
          Issue Type: New Feature
            Reporter: Runping Qi



It is highly desirable for the current map/reduce framework to be able to call 
functions in c++ (or other languages).

I am proposing a generic entension to the current framework to achieve the 
above goal. 
The extension is an application level solution, similar to 
HadoopStreaming in spirit, thus does not have impact on Hadoop core.
I will maintain the native map/reduce execution model. 

The basic idea is to use socket/rpc to go through the language barrier.
In particular, we can implement a generic mapper/reducer  class in Java as a 
proxy for calling functions in other language.
The configure function of the class will create a process that will open a user 
specified shared lirary act as an RPC server.
The map function of the class will just invoke an RPC call  the key/value pair. 
Such an RPC call is expected to return a list of key/value pairs. The map 
function then can emit the outputs.
The below is a sketch for the generic class:

        public class MapRedCPPAdapter implements Mapper, Reducer {
                String sharedLibraryName;
                RPCProxy theServer;
                
                ...

                public void configure(JobConf job) {
                        sharedLibraryName = job.get("shared.lib.name");
                        theServer = createServer(sharedLibraryName );
               }
               public void close() {
                        theServer.stop();
               }
               public void map(key, value, output, repoter) {
                        ArrayList pairs = invokeRemoteMap(theServer, key, 
value);
                        emit(pairs)
               }
               public void reduce (key, values, output, reporter) {
                        ArrayList pairs = invokeRemoteReduce(theServer, key, 
value);
                        emit(pairs)
               }
         }

The cons of this approach include are the overhead associated with 
RPC calls and creating an additional process per mapper/reducer task.
The pros are thhat the extension is clean, generic, simple. It is applicable to 
other foreign languages too.




-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to