Re: jobtracker / hadoop comsumer

Ben Ciceron Wed, 31 Aug 2011 09:42:01 -0700

> I think some changes will be needed with the instructions.
thx Richard, i apreciate your effort.
yes as a newbie i struggle with those but i'd be happy to participate
in your effort to make them better .


>
> If you have a local Hadoop installed then it goes to the local job tracker.
> You'll have to set up your Hadoop to point to the correct cluster. There are
> several ways to do it, but you can set your HADOOP_CONF_DIR to point to the
> correct hadoop conf xml (ie. location of mapred-site.xml) for the remote
> clusters.

ok , can we see an example ? i just tried it with several way to
define mapred.job.tracker inside
/opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml as explained in
the embeded comment then pointing HADOOP_CONF_DIR to
/opt/mapr/hadoop/hadoop-0.20.2/conf in the environment but calling the
hadoop consumer still call the default local jobtracker and fails:
...
11/08/31 12:33:48 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9001. Already tried 0 time(s).
11/08/31 12:33:49 INFO ipc.Client: Retrying connect to server:
localhost/127.0.0.1:9001. Already tried 1 time(s).
...

>
> The hadoop map tasks will need to connect to the kafka server port (the
> broker uri/port).
>
>
> On Tue, Aug 30, 2011 at 4:50 PM, Ben Ciceron <b...@triggit.com> wrote:
>
>> thx for confirming.
>>
>> so when follow the instructions to run the hadoop consumer
>> (https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-consumer)
>> i see my mapred job being submitted properly on hostB (jobtracker) but
>> it always fails with :
>>
>> console output on hostA
>>
>> 11/08/31 07:32:22 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the
>> same.
>> 11/08/31 07:32:35 INFO mapred.FileInputFormat: Total input paths to process
>> : 1
>> Hadoop job id=job_201108291829_0041
>> Exception in thread "main" java.lang.Exception: Hadoop ETL job failed!
>> Please check status on
>> http://localhost:9001/jobdetails.jsp?jobid=job_201108291829_0041
>>        at
>> kafka.etl.impl.SimpleKafkaETLJob.execute(SimpleKafkaETLJob.java:82)
>>        at kafka.etl.impl.SimpleKafkaETLJob.main(SimpleKafkaETLJob.java:100)
>>
>> in mapred log, for each tasktracker host i see :
>>
>> Meta VERSION="1" .
>> Job JOBID="job_201108291829_0041" JOBNAME="SimpleKafakETL" USER="root"
>> SUBMIT_TIME="1314747158794"
>>
>> JOBCONF="maprfs://10\.18\.125\.176:7222/var/mapr/cluster/mapred/jobTracker/staging/root/\.staging/job_201108291829_0041/job\.xml"
>> VIEW_JOB="*" MODIFY_JOB="*" JOB_QUEUE="default" .
>> Job JOBID="job_201108291829_0041" JOB_PRIORITY="NORMAL" .
>> Job JOBID="job_201108291829_0041" JOB_STATUS="RUNNING" .
>> Job JOBID="job_201108291829_0041" LAUNCH_TIME="1314747158885"
>> TOTAL_MAPS="1" TOTAL_REDUCES="0" JOB_STATUS="PREP" .
>> Task TASKID="task_201108291829_0041_m_000000" TASK_TYPE="MAP"
>> START_TIME="1314747160010"
>> SPLITS="/default-rack/hadoop2,/default-rack/hadoop9,/default-rack/hadoop6"
>> .
>> MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000"
>> TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0"
>> START_TIME="1314747160121"
>> TRACKER_NAME="tracker_hadoop9:localhost/127\.0\.0\.1:59411"
>> HTTP_PORT="50060" .
>> MapAttempt TASK_TYPE="MAP" TASKID="task_201108291829_0041_m_000000"
>> TASK_ATTEMPT_ID="attempt_201108291829_0041_m_000000_0"
>> TASK_STATUS="FAILED" FINISH_TIME="1314747164349" HOSTNAME="hadoop9"
>> ERROR="java\.io\.IOException: java\.net\.ConnectException: Connection
>> refused
>>        at
>> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:155)
>>        at
>> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:14)
>>        at
>> org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.moveToNext(MapTask\.java:210)
>>        at
>> org\.apache\.hadoop\.mapred\.MapTask$TrackedRecordReader\.next(MapTask\.java:195)
>>        at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:48)
>>        at
>> org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:393)
>>        at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:326)
>>        at org\.apache\.hadoop\.mapred\.Child$4\.run(Child\.java:268)
>>        at java\.security\.AccessController\.doPrivileged(Native Method)
>>        at javax\.security\.auth\.Subject\.doAs(Subject\.java:396)
>>        at
>> org\.apache\.hadoop\.security\.UserGroupInformation\.doAs(UserGroupInformation\.java:1074)
>>        at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:262)
>> Caused by: java\.net\.ConnectException: Connection refused
>>        at sun\.nio\.ch\.Net\.connect(Native Method)
>>        at
>> sun\.nio\.ch\.SocketChannelImpl\.connect(SocketChannelImpl\.java:500)
>>        at
>> kafka\.consumer\.SimpleConsumer\.connect(SimpleConsumer\.scala:54)
>>        at
>> kafka\.consumer\.SimpleConsumer\.getOrMakeConnection(SimpleConsumer\.scala:193)
>>        at
>> kafka\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:156)
>>        at
>> kafka\.javaapi\.consumer\.SimpleConsumer\.getOffsetsBefore(SimpleConsumer\.scala:65)
>>        at
>> kafka\.etl\.KafkaETLContext\.getOffsetRange(KafkaETLContext\.java:209)
>>        at kafka\.etl\.KafkaETLContext\.<init>(KafkaETLContext\.java:97)
>>        at
>> kafka\.etl\.KafkaETLRecordReader\.next(KafkaETLRecordReader\.java:115)
>>        \.\.\. 11 more
>>
>> which other port do i need to open between hostA and the tasktrackers ?
>> please note i can send a simple non kafka job from the same hostA to
>> hadoop and it completes successfully.
>>
>> Cheers,
>> Ben-
>>
>>
>>
>>
>> On Tue, Aug 30, 2011 at 12:04 PM, Richard Park <richard.b.p...@gmail.com>
>> wrote:
>> > The answer should be yes, each process should be able to run on different
>> > hosts. We are currently doing this.
>> >
>> > Host A submits kafka hadoop job to the job tracker on Host B,
>> > Host B then then connects to Host C (or many host C's)
>> >
>> > I planned on having a look at the example again to see if there are steps
>> > there are missing, or if the examples need to be beefed up.
>> >
>> > Thanks,
>> > -Richard
>> >
>> >
>> > On Tue, Aug 30, 2011 at 11:39 AM, Ben Ciceron <b...@triggit.com> wrote:
>> >
>> >> let me rephrase this:
>> >>
>> >> can any of the kafka process run outside the hadoop cluster as long as
>> >> it can connect to the hadoop process from that host ?
>> >> e.g :
>> >>
>> >> hostA (NOT in th hadoop cluster) : runs kafka hadoop consumer
>> >> hostB (in th hadoop cluster) : runs jobtracker
>> >>
>> >>
>> >> Cheers,
>> >> Ben-
>> >>
>> >>
>> >>
>> >>
>> >> On Mon, Aug 29, 2011 at 4:59 PM, Jun Rao <jun...@gmail.com> wrote:
>> >> > My understanding is that it's not tied to localhost. You just need to
>> >> change
>> >> > the jobtracker setting in you Hadoop config.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> > On Thu, Aug 25, 2011 at 4:31 PM, Ben Ciceron <b...@triggit.com> wrote:
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >> does kafka hadoop consumer expect the jobtracker to run locally only
>> ?
>> >> >> it seems it expect it locally (localhost/127.0.0.1:9001) .
>> >> >> Is it a requirement or there is a way to change it to a remote uri ?
>> >> >>
>> >> >> Cheers,
>> >> >> Ben-
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: jobtracker / hadoop comsumer

Reply via email to