[
https://issues.apache.org/jira/browse/HADOOP-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen O'Malley resolved HADOOP-913.
----------------------------------
Resolution: Duplicate
Fix Version/s: 0.11.0
Duplicate of HADOOP-234.
> dynamically loading C++ mapper/reducer classes in map/reduce jobs
> ------------------------------------------------------------------
>
> Key: HADOOP-913
> URL: https://issues.apache.org/jira/browse/HADOOP-913
> Project: Hadoop
> Issue Type: New Feature
> Reporter: Runping Qi
> Fix For: 0.11.0
>
>
> It is highly desirable for the current map/reduce framework to be able to
> call functions in c++ (or other languages).
> I am proposing a generic entension to the current framework to achieve the
> above goal.
> The extension is an application level solution, similar to
> HadoopStreaming in spirit, thus does not have impact on Hadoop core.
> I will maintain the native map/reduce execution model.
> The basic idea is to use socket/rpc to go through the language barrier.
> In particular, we can implement a generic mapper/reducer class in Java as a
> proxy for calling functions in other language.
> The configure function of the class will create a process that will open a
> user specified shared lirary act as an RPC server.
> The map function of the class will just invoke an RPC call the key/value
> pair.
> Such an RPC call is expected to return a list of key/value pairs. The map
> function then can emit the outputs.
> The below is a sketch for the generic class:
> public class MapRedCPPAdapter implements Mapper, Reducer {
> String sharedLibraryName;
> RPCProxy theServer;
>
> ...
> public void configure(JobConf job) {
> sharedLibraryName = job.get("shared.lib.name");
> theServer = createServer(sharedLibraryName );
> }
> public void close() {
> theServer.stop();
> }
> public void map(key, value, output, repoter) {
> ArrayList pairs = invokeRemoteMap(theServer, key,
> value);
> emit(pairs)
> }
> public void reduce (key, values, output, reporter) {
> ArrayList pairs = invokeRemoteReduce(theServer, key,
> value);
> emit(pairs)
> }
> }
> The cons of this approach include are the overhead associated with
> RPC calls and creating an additional process per mapper/reducer task.
> The pros are thhat the extension is clean, generic, simple. It is applicable
> to other foreign languages too.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira