We don't use demux at my site, so I'd love to have Eric or Jerome jump in here. But that said:
I believe the typical way to set this up is to have conf/chukwa-env.sh define HADOOP_CONF_DIR; the filesystem is then specified via the Hadoop configuration. (fs.default.name) You shouldn't need to change chukwa-demux-conf. In re processSinkFiles -- What version of Chukwa are you using? In Chukwa 0.3, the only formal release we've done so far, there's no processSinkFiles.sh, and the line in start-data-processors that references it has been commented out. You don't need it; references to it are a historical artifact that should go away in the next release. --Ari On Thu, Jan 28, 2010 at 11:15 AM, Corbin Hoenes <cor...@tynt.com> wrote: > I'm having some difficulty with the demux part of setting up chukwa. I > assume I am supposed to run the start-data-processors.sh script to startup > all the map reduce jobs that handle demux and archiving. > > My goal is to pull the logs we are collecting out of the sink files and into > something we can start to run our pig scripts on. > > When I run start-data-processors it gives me this though: > > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > file:/chukwa/demuxProcessing/mrInput > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) > at > org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:851) > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:822) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1290) > at org.apache.hadoop.chukwa.extraction.demux.Demux.run(Demux.java:192) > > Which seems like I need to configure it to try to connect to hdfs rather than > file:/ > > Only docs I've found are here: > http://hadoop.apache.org/chukwa/docs/current/admin.html > Is there a guide to configuring chukwa-demux-conf.xml? > > I also noticed start-data-processors.sh tries to start processSinkFiles.sh > which doesn't exist for me--do I need to get this script?s > > > -- Ari Rabkin asrab...@gmail.com UC Berkeley Computer Science Department