> I think some changes will be needed with the instructions. thx Richard, i apreciate your effort. yes as a newbie i struggle with those but i'd be happy to participate in your effort to make them better .
> > If you have a local Hadoop installed then it goes to the local job tracker. > You'll have to set up your Hadoop to point to the correct cluster. There are > several ways to do it, but you can set your HADOOP_CONF_DIR to point to the > correct hadoop conf xml (ie. location of mapred-site.xml) for the remote > clusters. ok , can we see an example ? i just tried it with several way to define mapred.job.tracker inside /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml as explained in the embeded comment then pointing HADOOP_CONF_DIR to /opt/mapr/hadoop/hadoop-0.20.2/conf in the environment but calling the hadoop consumer still call the default local jobtracker and fails: ... 11/08/31 12:33:48 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 0 time(s). 11/08/31 12:33:49 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9001. Already tried 1 time(s). ... > > The hadoop map tasks will need to connect to the kafka server port (the > broker uri/port). > > > On Tue, Aug 30, 2011 at 4:50 PM, Ben Ciceron <b...@triggit.com> wrote: > >> thx for confirming. >> >> so when follow the instructions to run the hadoop consumer >> (https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer) >> i see my mapred job being submitted properly on hostB (jobtracker) but >> it always fails with : >> >> console output on hostA >> >> 11/08/31 07:32:22 WARN mapred.JobClient: Use GenericOptionsParser for >> parsing the arguments. Applications should implement Tool for the >> same. >> 11/08/31 07:32:35 INFO mapred.FileInputFormat: Total input paths to process >> : 1 >> Hadoop job id=job_201108291829_0041 >> Exception in thread "main" java.lang.Exception: Hadoop ETL job failed! >> Please check status on >> http://localhost:9001/jobdetails.jsp?jobid=job_201108291829_0041 >> at >> kafka.etl.impl.SimpleKafkaETLJob.execute(SimpleKafkaETLJob.java:82) >> at kafka.etl.impl.SimpleKafkaETLJob.main(SimpleKafkaETLJob.java:100) >> >> in mapred log, for each tasktracker host i see : >> >> Meta VERSION="1" . >> Job JOBID="job_201108291829_0041" JOBNAME="SimpleKafakETL" USER="root" >> SUBMIT_TIME="1314747158794" >> >> JOBCONF="maprfs://10\.18\.125\.176:7222/var/mapr/cluster/mapred/jobTracker/staging/root/\.staging/job_201108291829_0041/job\.xml" >> VIEW_JOB="*" MODIFY_JOB="*" JOB_QUEUE="default" . >> Job JOBID="job_201108291829_0041" JOB_PRIORITY="NORMAL" . >> Job JOBID="job_201108291829_0041" JOB_STATUS="RUNNING" . >> Job JOBID="job_201108291829_0041" LAUNCH_TIME="1314747158885" >> TOTAL_MAPS="1" TOTAL_REDUCES="0" JOB_STATUS="PREP" . >> Task TASKID="task_201108291829_0041_m_000000" TASK_TYPE="MAP" >> START_TIME="1314747160010" >> SPLITS="/default-rack/hadoop2,/default-rack/hadoop9,/default-rack/hadoop6" >> . >> MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000" >> TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0" >> START_TIME="1314747160121" >> TRACKER_NAME="tracker_hadoop9:localhost/127\.0\.0\.1:59411" >> HTTP_PORT="50060" . >> MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000" >> TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0" >> TASK_STATUS="FAILED" FINISH_TIME="1314747164349" HOSTNAME="hadoop9" >> ERROR="java\.io\.IOException: java\.net\.ConnectException: Connection >> refused >> at >> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:155) >> at >> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:14) >> at >> org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.moveToNext(MapTask\.java:210) >> at >> org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.next(MapTask\.java:195) >> at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:48) >> at >> org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:393) >> at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:326) >> at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:268) >> at java\.security\.AccessController\.doPrivileged(Native Method) >> at javax\.security\.auth\.Subject\.doAs(Subject\.java:396) >> at >> org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1074) >> at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:262) >> Caused by: java\.net\.ConnectException: Connection refused >> at sun\.nio\.ch\.Net\.connect(Native Method) >> at >> sun\.nio\.ch\.SocketChannelImpl\.connect(SocketChannelImpl\.java:500) >> at >> kafka\.consumer\.SimpleConsumer\.connect(SimpleConsumer\.scala:54) >> at >> kafka\.consumer\.SimpleConsumer\.getOrMakeConnection(SimpleConsumer\.scala:193) >> at >> kafka\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:156) >> at >> kafka\.javaapi\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:65) >> at >> kafka\.etl\.KafkaETLContext\.getOffsetRange(KafkaETLContext\.java:209) >> at kafka\.etl\.KafkaETLContext\.<init>(KafkaETLContext\.java:97) >> at >> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:115) >> \.\.\. 11 more >> >> which other port do i need to open between hostA and the tasktrackers ? >> please note i can send a simple non kafka job from the same hostA to >> hadoop and it completes successfully. >> >> Cheers, >> Ben- >> >> >> >> >> On Tue, Aug 30, 2011 at 12:04 PM, Richard Park <richard.b.p...@gmail.com> >> wrote: >> > The answer should be yes, each process should be able to run on different >> > hosts. We are currently doing this. >> > >> > Host A submits kafka hadoop job to the job tracker on Host B, >> > Host B then then connects to Host C (or many host C's) >> > >> > I planned on having a look at the example again to see if there are steps >> > there are missing, or if the examples need to be beefed up. >> > >> > Thanks, >> > -Richard >> > >> > >> > On Tue, Aug 30, 2011 at 11:39 AM, Ben Ciceron <b...@triggit.com> wrote: >> > >> >> let me rephrase this: >> >> >> >> can any of the kafka process run outside the hadoop cluster as long as >> >> it can connect to the hadoop process from that host ? >> >> e.g : >> >> >> >> hostA (NOT in th hadoop cluster) : runs kafka hadoop consumer >> >> hostB (in th hadoop cluster) : runs jobtracker >> >> >> >> >> >> Cheers, >> >> Ben- >> >> >> >> >> >> >> >> >> >> On Mon, Aug 29, 2011 at 4:59 PM, Jun Rao <jun...@gmail.com> wrote: >> >> > My understanding is that it's not tied to localhost. You just need to >> >> change >> >> > the jobtracker setting in you Hadoop config. >> >> > >> >> > Thanks, >> >> > >> >> > Jun >> >> > >> >> > On Thu, Aug 25, 2011 at 4:31 PM, Ben Ciceron <b...@triggit.com> wrote: >> >> > >> >> >> Hello, >> >> >> >> >> >> does kafka hadoop consumer expect the jobtracker to run locally only >> ? >> >> >> it seems it expect it locally (localhost/127.0.0.1:9001) . >> >> >> Is it a requirement or there is a way to change it to a remote uri ? >> >> >> >> >> >> Cheers, >> >> >> Ben- >> >> >> >> >> > >> >> >> > >> >