Thanks for clarifying the issues. I am glad it is working. I think the follow up tasks are:
1. Clear documentation on collectors file and agents file. 2. Still clean up CHUKWA-446 to avoid confusion of shell scripts, and remove tools/init.d scripts from RPM packaging. Patch contributions are welcome. :) Regards, Eric On 1/15/10 9:49 AM, "Bill Graham" <billgra...@gmail.com> wrote: > Just to follow up on my own comment here, I've uncovered the issue. It seems > bin/start-collectors.sh does correctly read the conf/collectors file for a > list of hostnames to start the collectors on via ssh. > > The sed error I was getting upon running the start script was because the > conf/collectors file didn't have hostnames in it. Instead it had HTTP URLs to > the collectors, since what the agent expects. > > The problem is that conf/collectors file is used by two different processes, > and each requires a different format. bin/agents.sh expects URLs to the > collectors it should communicate with, while bin/start-collectors.sh expects a > list of hostname to start collectors on. I'm running each on the same host in > dev so I got bit by this issue. > > In our production environment this will be a non-issue, since we'll have RPMs > with different configs on different hosts, but this is a source of confusion > that I thought I should point out. > > thanks, > Bill > > On Tue, Jan 12, 2010 at 12:15 PM, Bill Graham <billgra...@gmail.com> wrote: >> Thanks Eric, that makes sense. I'm going to use the Hadoop-style start-*.sh >> scripts instead, since they seem to work better for my needs. >> >> The agent and data processor start/stop scripts work for me, but the >> bin/start-collectors.sh scripts fails: >> >> $ sda bin/start-collectors.sh >> sed: -e expression #1, char 11: unknown option to `s' >> >> Has anyone else seen this? >> >> It also seems like this script ultimately invokes slaves.sh, which looks for >> hostnames in conf/agents. Is this a bug or are collectors expected to run on >> the agents nodes? I thought they were more typically run on the Hadoop data >> nodes. >> >> >> >> On Tue, Jan 12, 2010 at 10:50 AM, Eric Yang <ey...@yahoo-inc.com> wrote: >>> Hi Bill, >>> >>> The scripts are over due for a clean up. Majority of the scripts were >>> designed to work in a specific environment for Yahoo. >>> >>> On 1/12/10 10:24 AM, "Bill Graham" <billgra...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> To get the scripts in the tools/init.d folder to run in our environment I >>>> had >>>> to make a few of the same tweaks to each of them. I'd like to suggest some >>>> refactoring of these scripts to make them more easy to use. These are the >>>> issues I came across: >>>> >>>> 1. The scripts try to write lock files to /var/lock/subsys, which is owned >>>> by >>>> root. Can we change this location to be somewhere that doesn't require root >>>> access? >>> >>> This was designed to run as /etc/init.d script. I would recommend to remove >>> tools/init.d and /etc/init.d script, and focus on using start-*.sh stop-*.sh >>> commands like hadoop. >>> >>>> 2. The actual run command does an su to the CHUKWA_USER which also caused >>>> problems for us. It seems like it would be cleaner to not embed the su >>>> calls >>>> in the script, but instead allow the user to su when they run the script >>>> (which worked much better for us). That way everything done by the script >>>> would be done by the same user. >>> >>> Base on previous experience, we had a user who were running everything as >>> root. Su was put in there to safe guard the program from that user. It was >>> a pain to clean up hundred of machines with root owned files. In a >>> production environment, it is probably not Chukwa's responsibility to safe >>> guard permission. +1 on removing su. >>> >>>> 3. Each script has CHUKWA_HOME, CHUKWA_CONF_DIR and CHUKWA_USER hard coded. >>>> Hard coded defaults is ok, but the ability to override them without >>>> modifying >>>> the scripts would be ideal. >>>> >>> >>> Current documentation doesn¹t talk about those properties are configurable >>> during build time. If you compile from trunk or 0.3 branch, you can copy >>> default.properties to build.properties, and specify the path and running >>> user. However, the replace only applies to /etc/init.d/chukwa-* scripts, >>> and install from RPM. (Chukwa has an "ant rpm" target.) tools/init.d was >>> meant as a template, hence no replacement took place. >>> >>>> Let me know if you think these changes make sense and I'll open a JIRA. >>> >>> Yes, open a jira, and we can discuss in depth of the changes. I would like >>> to see /etc/init.d and tools/init.d to be removed completely, and use >>> start-*.sh and stop-*.sh command to be consistent with hadoop. >>> >>> Regards, >>> Eric >>> >> > >