This is dynamite and I think it would be very helpful to have it linked to from the website. Although the install and config doesn't appear too bulky, there are a number of steps and this would be non trivial for someone who is not familiarized with Hadoop xml based runtime configuration. I'm finishing off a patch for Chukwa right now then I will be building HTtace into my Nutxh 2.x search stack. My aim is to write something similar for that deployment as R would also be very helpful to see tracing for Gora data stores as well.
On Monday, March 2, 2015, Colin P. McCabe <[email protected]> wrote: > A few people have asked how to get started with HTrace development. It's a > good question and we don't have a great README up about it so I thought I > would > write something. > > HTrace is all about tracing distributed systems. So the best way to get > started is to plug htrace into your favorite distributed system and see > what > cool things happen or what bugs pop up. Since I'm an HDFS developer, > that's > the distributed system that I'm most familiar with. So I will do a quick > writeup about how to use HTrace + HDFS. (HBase + HTrace is another very > important use-case that I would like to write about later, but one step at > a > time.) > > Just a quick note: a lot of this software is relatively new. So there may > be > bugs or integration pain points that you encounter. > > There has not yet been a stable release of Hadoop that contained Apache > HTrace. > There have been releases that contained the pre-Apache version of HTrace, > but > that's no fun. If we want to do development, we want to be able to run the > latest version of the code. So we will have to build it ourselves. > > Building HTrace is not too bad. First we install the dependencies: > > cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel > > If you have a different Linux distro this command will vary slightly, of > course. On Macs, "brew" is a good option. > Next we use Maven to build the source: > > > cmccabe@keter:~/> git clone > https://git-wip-us.apache.org/repos/asf/incubator-htrace.git > > cmccabe@keter:~/> cd incubator-htrace > > cmccabe@keter:~/> git checkout master > > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true > -Drat.skip > > OK. So htrace is built and installed to the local ~/.m2 directory. > > We should see it under the .m2: > cmccabe@keter:~/> find ~/.m2 | grep htrace-core > ... > > > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT > > > > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated > > > > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated > ... > > The version you built should be 3.2.0-SNAPSHOT. > > Next, we check out Hadoop: > > > cmccabe@keter:~/> git clone > https://git-wip-us.apache.org/repos/asf/hadoop.git > > cmccabe@keter:~/> cd hadoop > > cmccabe@keter:~/> git checkout branch-2 > > So we are basically building a pre-release version of Hadoop 2.7, currently > known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT > rather > than the stable 3.1.0 release which it would ordinarily use in branch-2. I > applied this diff to hadoop-project/pom.xml > > > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml > > index 569b292..5b7e466 100644 > > --- a/hadoop-project/pom.xml > > +++ b/hadoop-project/pom.xml > > @@ -785,7 +785,7 @@ > > <dependency> > > <groupId>org.apache.htrace</groupId> > > <artifactId>htrace-core</artifactId> > > - <version>3.1.0-incubating</version> > > + <version>3.2.0-incubating-SNAPSHOT</version> > > </dependency> > > <dependency> > > <groupId>org.jdom</groupId> > > Next, I built Hadoop: > > cmccabe@keter:~/> mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true > > You should get a package with Hadoop jars named like so: > > ... > > ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar > > ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar > ... > > This package should also contain an htrace-3.2.0-SNAPSHOT jar. > > OK, so how can we start seeing some trace spans? The easiest way is to > configure LocalFileSpanReceiver. > > Add this to your hdfs-site.xml: > > > <property> > > <name>hadoop.htrace.spanreceiver.classes</name> > > <value>org.apache.htrace.impl.LocalFileSpanReceiver</value> > > </property> > > <property> > > <name>hadoop.htrace.sampler</name> > > <value>AlwaysSampler</value> > > </property> > > When you run the Hadoop daemons, you should see them writing to files named > /tmp/${PROCESS_ID} (for each different process). If this doesn't happen, > try > cranking up your log4j level to TRACE to see why the SpanReceiver could not > be > created. > > You should see something like this in the log4j logs: > > > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of > type org.apache.htrace.impl.LocalFileSpanReceiver > > at > org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92) > > at > > org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161) > > at > > org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147) > > at > > org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82) > > Running htraced is easy. You simply run the binary: > > > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced > -Dlog.level=TRACE -Ddata.store.clear > > You should see messages like this: > > > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced > -Dlog.level=TRACE -Ddata.store.clear > > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf > > 2015-03-02T19:08:33-08:00 D: data.store.clear = true > > 2015-03-02T19:08:33-08:00 D: log.level = TRACE > > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory > /tmp/htrace1/db > > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db: > Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is > false) > > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in > /tmp/htrace1/db > > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at > /tmp/htrace1/db. > > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory > /tmp/htrace2/db > > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db: > Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is > false) > > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in > /tmp/htrace2/db > > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at > /tmp/htrace2/db. > ... > > Similar to Hadoop daemons, htraced can be configured either through an XML > file > named htraced-conf.xml (found in a location pointed to by > HTRACED_CONF_DIR), or > by passing -Dkey=value flags on the command line. > > Let's check out the htrace command. > > > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace > serverInfo > > HTraced server version 3.2.0-incubating-SNAPSHOT > (5c0a712c7dd4263f5e2a88d4c61a0facab25953f) > > "serverInfo" queries the htraced server via REST and get back a response. > For help using the htrace command, we can run: > > > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help > > usage: ./htrace-core/src/go/build/htrace [<flags>] <command> [<flags>] > [<args> ...] > > > > The Apache HTrace command-line tool. This tool retrieves and modifies > settings and other data on a running htraced daemon. > > > > If we find an htraced-conf.xml configuration file in the list of > directories specified in HTRACED_CONF_DIR, we will use that configuration; > otherwise, the defaults will be used. > > > > Flags: > > --help Show help. > > --Dmy.key="my.value" > > Set configuration key 'my.key' to 'my.value'. Replace > 'my.key' with any key you want to set. > > --addr=ADDR Server address. > > --verbose Verbose. > > > > Commands: > > help [<command>] > > Show help for a command. > > ... > > We can load spans into the htraced daemon from a text file using > ./build/htraced loadSpans [file-path], and dump the span information using > ./build/htraced dumpAll. > > Now, at this point, we would like our htraced client (Hadoop) to send spans > directly to htraced, rather than dumping them to a local file. > To make this work, we will need to put the htrace-htraced jar on the hadoop > CLASSPATH. > > There is probably a better way to do it by setting HADOOP_CLASSPATH, but > this > simple script just puts the jar on every part of the Hadoop CLASSPATH I > could > think of where it might need to be: > > > #!/bin/bash > > > > # Copy the installed version of htrace-core to the correct hadoop jar > locations > > cat << EOF | xargs -n 1 cp > > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > > EOF > > > > # Copy the installed version of htrace-htraced to the correct hadoop jar > locations > > cat << EOF | xargs -n 1 cp > > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > > > > /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > > EOF > > At this point, I changed hdfs-site.xml so that > hadoop.htrace.spanreceiver.classes was set to the htraced span receiver: > > <property> > > <name>hadoop.htrace.spanreceiver.classes</name> > > <value>org.apache.htrace.impl.HTracedRESTReceiver</value> > > </property> > > <property> > > <name>htraced.rest.url</name> > > <value>http://lumbergh.initech.com:9095/</value> > > </property> > > Obviously set the htraced.rest.url to the host on which you are running > htraced. > This setup should work for sending spans to htraced. > > To see the web UI, point your web browser at > http://lumbergh.initech.com:9095/ (or whatever the host name is for you > where htraced is running). > > I hope this helps some folks out. Hopefully building Hadoop and massaging > the classpath is not too bad. This install process will improve in the > future, as more projects get stable releases with HTrace. There has also > been some discussion of making docker images, which might help new > developers get started. > > best, > Colin > -- *Lewis*
