I'm having some difficulty with the demux part of setting up chukwa.  I assume 
I am supposed to run the start-data-processors.sh script to startup all the map 
reduce jobs that handle demux and archiving.

My goal is to pull the logs we are collecting out of the sink files and into 
something we can start to run our pig scripts on.

When I run start-data-processors it gives me this though:

org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
file:/chukwa/demuxProcessing/mrInput
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
        at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:851)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:822)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1290)
        at org.apache.hadoop.chukwa.extraction.demux.Demux.run(Demux.java:192)

Which seems like I need to configure it to try to connect to hdfs rather than 
file:/

Only docs I've found are here: 
http://hadoop.apache.org/chukwa/docs/current/admin.html
Is there a guide to configuring chukwa-demux-conf.xml?   

I also noticed start-data-processors.sh tries to start processSinkFiles.sh 
which doesn't exist for me--do I need to get this script?s


Reply via email to