[
http://issues.apache.org/jira/browse/NUTCH-191?page=comments#action_12364739 ]
Owen O'Malley commented on NUTCH-191:
-------------------------------------
Wouldn't it be appropriate to make input splitting into a task, so that
getSplits could be run by the TaskTrackerChild? That way the current interfaces
would remain and the user could override it from the job.jar.
An example where we would find it useful is where the map input is coming from
external servers over sockets. getSplits could return splits of the form
FileSplit("host:port", 0 ,1000) and the RecordReader needs to know how to
translate that name into a data stream.
> InputFormat used in job must be in JobTracker classpath (not loaded from job
> JAR)
> ---------------------------------------------------------------------------------
>
> Key: NUTCH-191
> URL: http://issues.apache.org/jira/browse/NUTCH-191
> Project: Nutch
> Type: Bug
> Versions: 0.8-dev
> Environment: ~20 node nutch mapreduce environment, running SVN trunk, on
> Linux
> Reporter: Bryan Pendleton
> Priority: Minor
>
> During development, I've been creating/tweaking custom InputFormat
> implementations. However, when you try to run a job against a running
> cluster, you get:
> Exception in thread "main" java.io.IOException: java.lang.RuntimeException:
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> my.custom.InputFormat
> at org.apache.nutch.ipc.Client.call(Client.java:294)
> at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
> at $Proxy0.submitJob(Unknown Source)
> at org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
> at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
> at com.parc.uir.wikipedia.WikipediaJob.main(WikipediaJob.java:85)
> This error goes away if I restart the TaskTrackers/JobTracker with a
> classpath which includes the needed code. Other classes (Mapper, Reducer)
> appear to be available out of the jar file specified in the JobConf, but not
> the InputFormat. Obviously, it's less than idea to have to restart the
> JobTracker whenever there's a change to a job-specific class.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira