Re: jobtracker / hadoop comsumer

Richard Park Tue, 30 Aug 2011 17:38:03 -0700

I think some changes will be needed with the instructions.

If you have a local Hadoop installed then it goes to the local job tracker.
You'll have to set up your Hadoop to point to the correct cluster. There are
several ways to do it, but you can set your HADOOP_CONF_DIR to point to the
correct hadoop conf xml (ie. location of mapred-site.xml) for the remote
clusters.


The hadoop map tasks will need to connect to the kafka server port (the
broker uri/port).


On Tue, Aug 30, 2011 at 4:50 PM, Ben Ciceron <b...@triggit.com> wrote:

> thx for confirming.
>
> so when follow the instructions to run the hadoop consumer
> (https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer)
> i see my mapred job being submitted properly on hostB (jobtracker) but
> it always fails with :
>
> console output on hostA
>
> 11/08/31 07:32:22 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the
> same.
> 11/08/31 07:32:35 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> Hadoop job id=job_201108291829_0041
> Exception in thread "main" java.lang.Exception: Hadoop ETL job failed!
> Please check status on
> http://localhost:9001/jobdetails.jsp?jobid=job_201108291829_0041
>        at
> kafka.etl.impl.SimpleKafkaETLJob.execute(SimpleKafkaETLJob.java:82)
>        at kafka.etl.impl.SimpleKafkaETLJob.main(SimpleKafkaETLJob.java:100)
>
> in mapred log, for each tasktracker host i see :
>
> Meta VERSION="1" .
> Job JOBID="job_201108291829_0041" JOBNAME="SimpleKafakETL" USER="root"
> SUBMIT_TIME="1314747158794"
>
> JOBCONF="maprfs://10\.18\.125\.176:7222/var/mapr/cluster/mapred/jobTracker/staging/root/\.staging/job_201108291829_0041/job\.xml"
> VIEW_JOB="*" MODIFY_JOB="*" JOB_QUEUE="default" .
> Job JOBID="job_201108291829_0041" JOB_PRIORITY="NORMAL" .
> Job JOBID="job_201108291829_0041" JOB_STATUS="RUNNING" .
> Job JOBID="job_201108291829_0041" LAUNCH_TIME="1314747158885"
> TOTAL_MAPS="1" TOTAL_REDUCES="0" JOB_STATUS="PREP" .
> Task TASKID="task_201108291829_0041_m_000000" TASK_TYPE="MAP"
> START_TIME="1314747160010"
> SPLITS="/default-rack/hadoop2,/default-rack/hadoop9,/default-rack/hadoop6"
> .
> MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000"
> TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0"
> START_TIME="1314747160121"
> TRACKER_NAME="tracker_hadoop9:localhost/127\.0\.0\.1:59411"
> HTTP_PORT="50060" .
> MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000"
> TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0"
> TASK_STATUS="FAILED" FINISH_TIME="1314747164349" HOSTNAME="hadoop9"
> ERROR="java\.io\.IOException: java\.net\.ConnectException: Connection
> refused
>        at
> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:155)
>        at
> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:14)
>        at
> org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.moveToNext(MapTask\.java:210)
>        at
> org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.next(MapTask\.java:195)
>        at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:48)
>        at
> org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:393)
>        at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:326)
>        at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:268)
>        at java\.security\.AccessController\.doPrivileged(Native Method)
>        at javax\.security\.auth\.Subject\.doAs(Subject\.java:396)
>        at
> org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1074)
>        at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:262)
> Caused by: java\.net\.ConnectException: Connection refused
>        at sun\.nio\.ch\.Net\.connect(Native Method)
>        at
> sun\.nio\.ch\.SocketChannelImpl\.connect(SocketChannelImpl\.java:500)
>        at
> kafka\.consumer\.SimpleConsumer\.connect(SimpleConsumer\.scala:54)
>        at
> kafka\.consumer\.SimpleConsumer\.getOrMakeConnection(SimpleConsumer\.scala:193)
>        at
> kafka\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:156)
>        at
> kafka\.javaapi\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:65)
>        at
> kafka\.etl\.KafkaETLContext\.getOffsetRange(KafkaETLContext\.java:209)
>        at kafka\.etl\.KafkaETLContext\.<init>(KafkaETLContext\.java:97)
>        at
> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:115)
>        \.\.\. 11 more
>
> which other port do i need to open between hostA and the tasktrackers ?
> please note i can send a simple non kafka job from the same hostA to
> hadoop and it completes successfully.
>
> Cheers,
> Ben-
>
>
>
>
> On Tue, Aug 30, 2011 at 12:04 PM, Richard Park <richard.b.p...@gmail.com>
> wrote:
> > The answer should be yes, each process should be able to run on different
> > hosts. We are currently doing this.
> >
> > Host A submits kafka hadoop job to the job tracker on Host B,
> > Host B then then connects to Host C (or many host C's)
> >
> > I planned on having a look at the example again to see if there are steps
> > there are missing, or if the examples need to be beefed up.
> >
> > Thanks,
> > -Richard
> >
> >
> > On Tue, Aug 30, 2011 at 11:39 AM, Ben Ciceron <b...@triggit.com> wrote:
> >
> >> let me rephrase this:
> >>
> >> can any of the kafka process run outside the hadoop cluster as long as
> >> it can connect to the hadoop process from that host ?
> >> e.g :
> >>
> >> hostA (NOT in th hadoop cluster) : runs kafka hadoop consumer
> >> hostB (in th hadoop cluster) : runs jobtracker
> >>
> >>
> >> Cheers,
> >> Ben-
> >>
> >>
> >>
> >>
> >> On Mon, Aug 29, 2011 at 4:59 PM, Jun Rao <jun...@gmail.com> wrote:
> >> > My understanding is that it's not tied to localhost. You just need to
> >> change
> >> > the jobtracker setting in you Hadoop config.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Thu, Aug 25, 2011 at 4:31 PM, Ben Ciceron <b...@triggit.com> wrote:
> >> >
> >> >> Hello,
> >> >>
> >> >> does kafka hadoop consumer expect the jobtracker to run locally only
> ?
> >> >> it seems it expect it locally (localhost/127.0.0.1:9001) .
> >> >> Is it a requirement or there is a way to change it to a remote uri ?
> >> >>
> >> >> Cheers,
> >> >> Ben-
> >> >>
> >> >
> >>
> >
>

Re: jobtracker / hadoop comsumer

Reply via email to