I'm having some difficulty with the demux part of setting up chukwa. I assume I am supposed to run the start-data-processors.sh script to startup all the map reduce jobs that handle demux and archiving.
My goal is to pull the logs we are collecting out of the sink files and into something we can start to run our pig scripts on. When I run start-data-processors it gives me this though: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/chukwa/demuxProcessing/mrInput at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:851) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:822) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1290) at org.apache.hadoop.chukwa.extraction.demux.Demux.run(Demux.java:192) Which seems like I need to configure it to try to connect to hdfs rather than file:/ Only docs I've found are here: http://hadoop.apache.org/chukwa/docs/current/admin.html Is there a guide to configuring chukwa-demux-conf.xml? I also noticed start-data-processors.sh tries to start processSinkFiles.sh which doesn't exist for me--do I need to get this script?s