Re: start-data-processors.sh

Corbin Hoenes Thu, 28 Jan 2010 14:10:59 -0800

Eric/Ari - Thanks... got it working.

On Jan 28, 2010, at 3:01 PM, Ariel Rabkin wrote:


> You can also try defining fs.default.name in chukwa-demux-conf.xml
> 
> On Thu, Jan 28, 2010 at 1:36 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>> Echo on what Ari said.  Make sure your hdfs-site.xml is in HADOOP_CONF_DIR
>> or in the class path.  Demux uses this file to determine location of your
>> HDFS.
>> 
>> Regards,
>> Eric
>> 
>> 
>> On 1/28/10 11:58 AM, "Ariel Rabkin" <asrab...@gmail.com> wrote:
>> 
>>> We don't use demux at my site, so I'd love to have Eric or Jerome jump
>>> in here.  But that said:
>>> 
>>> I believe the typical way to set this up is to have conf/chukwa-env.sh
>>> define HADOOP_CONF_DIR; the filesystem is then specified via the
>>> Hadoop configuration. (fs.default.name)  You shouldn't need to change
>>> chukwa-demux-conf.
>>> 
>>> In re processSinkFiles -- What version of Chukwa are you using?  In
>>> Chukwa 0.3, the only formal release we've done so far, there's no
>>> processSinkFiles.sh, and the line in start-data-processors that
>>> references it has been commented out.  You don't need it; references
>>> to it are a historical artifact that should go away in the next
>>> release.
>>> 
>>> --Ari
>>> 
>>> On Thu, Jan 28, 2010 at 11:15 AM, Corbin Hoenes <cor...@tynt.com> wrote:
>>>> I'm having some difficulty with the demux part of setting up chukwa.  I
>>>> assume I am supposed to run the start-data-processors.sh script to startup
>>>> all the map reduce jobs that handle demux and archiving.
>>>> 
>>>> My goal is to pull the logs we are collecting out of the sink files and 
>>>> into
>>>> something we can start to run our pig scripts on.
>>>> 
>>>> When I run start-data-processors it gives me this though:
>>>> 
>>>> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
>>>> file:/chukwa/demuxProcessing/mrInput
>>>>        at
>>>> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
>>>>        at
>>>> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInput
>>>> Format.java:44)
>>>>        at
>>>> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
>>>>        at
>>>> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:851)
>>>>        at
>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:822)
>>>>        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771)
>>>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1290)
>>>>        at 
>>>> org.apache.hadoop.chukwa.extraction.demux.Demux.run(Demux.java:192)
>>>> 
>>>> Which seems like I need to configure it to try to connect to hdfs rather 
>>>> than
>>>> file:/
>>>> 
>>>> Only docs I've found are here:
>>>> http://hadoop.apache.org/chukwa/docs/current/admin.html
>>>> Is there a guide to configuring chukwa-demux-conf.xml?
>>>> 
>>>> I also noticed start-data-processors.sh tries to start processSinkFiles.sh
>>>> which doesn't exist for me--do I need to get this script?s
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> -- 
> Ari Rabkin asrab...@gmail.com
> UC Berkeley Computer Science Department

Re: start-data-processors.sh

Reply via email to