Is the website on CMS? On Thu, Mar 5, 2015 at 5:18 PM, Colin P. McCabe <[email protected]> wrote:
> Can we set up a wiki? Stuff like this needs to be updated > periodically and it would be nice to have something like the hadoop > wiki. Of course there may be some out of date stuff from time to > time, but it's better than nothing... > > On Mon, Mar 2, 2015 at 8:52 PM, Lewis John Mcgibbney > <[email protected]> wrote: > > This is dynamite and I think it would be very helpful to have it linked > to > > from the website. > > Although the install and config doesn't appear too bulky, there are a > > number of steps and this would be non trivial for someone who is not > > familiarized with Hadoop xml based runtime configuration. > > I'm finishing off a patch for Chukwa right now then I will be building > > HTtace into my Nutxh 2.x search stack. My aim is to write something > similar > > for that deployment as R would also be very helpful to see tracing for > Gora > > data stores as well. > > Awesome. > > best, > Colin > > > > > On Monday, March 2, 2015, Colin P. McCabe <[email protected]> wrote: > > > >> A few people have asked how to get started with HTrace development. > It's a > >> good question and we don't have a great README up about it so I thought > I > >> would > >> write something. > >> > >> HTrace is all about tracing distributed systems. So the best way to get > >> started is to plug htrace into your favorite distributed system and see > >> what > >> cool things happen or what bugs pop up. Since I'm an HDFS developer, > >> that's > >> the distributed system that I'm most familiar with. So I will do a > quick > >> writeup about how to use HTrace + HDFS. (HBase + HTrace is another very > >> important use-case that I would like to write about later, but one step > at > >> a > >> time.) > >> > >> Just a quick note: a lot of this software is relatively new. So there > may > >> be > >> bugs or integration pain points that you encounter. > >> > >> There has not yet been a stable release of Hadoop that contained Apache > >> HTrace. > >> There have been releases that contained the pre-Apache version of > HTrace, > >> but > >> that's no fun. If we want to do development, we want to be able to run > the > >> latest version of the code. So we will have to build it ourselves. > >> > >> Building HTrace is not too bad. First we install the dependencies: > >> > >> cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel > >> > >> If you have a different Linux distro this command will vary slightly, of > >> course. On Macs, "brew" is a good option. > >> Next we use Maven to build the source: > >> > >> > cmccabe@keter:~/> git clone > >> https://git-wip-us.apache.org/repos/asf/incubator-htrace.git > >> > cmccabe@keter:~/> cd incubator-htrace > >> > cmccabe@keter:~/> git checkout master > >> > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true > >> -Drat.skip > >> > >> OK. So htrace is built and installed to the local ~/.m2 directory. > >> > >> We should see it under the .m2: > >> cmccabe@keter:~/> find ~/.m2 | grep htrace-core > >> ... > >> > > >> > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT > >> > > >> > >> > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated > >> > > >> > >> > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated > >> ... > >> > >> The version you built should be 3.2.0-SNAPSHOT. > >> > >> Next, we check out Hadoop: > >> > >> > cmccabe@keter:~/> git clone > >> https://git-wip-us.apache.org/repos/asf/hadoop.git > >> > cmccabe@keter:~/> cd hadoop > >> > cmccabe@keter:~/> git checkout branch-2 > >> > >> So we are basically building a pre-release version of Hadoop 2.7, > currently > >> known as branch-2. We will need to modify Hadoop to use 3.2.0-SNAPSHOT > >> rather > >> than the stable 3.1.0 release which it would ordinarily use in > branch-2. I > >> applied this diff to hadoop-project/pom.xml > >> > >> > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml > >> > index 569b292..5b7e466 100644 > >> > --- a/hadoop-project/pom.xml > >> > +++ b/hadoop-project/pom.xml > >> > @@ -785,7 +785,7 @@ > >> > <dependency> > >> > <groupId>org.apache.htrace</groupId> > >> > <artifactId>htrace-core</artifactId> > >> > - <version>3.1.0-incubating</version> > >> > + <version>3.2.0-incubating-SNAPSHOT</version> > >> > </dependency> > >> > <dependency> > >> > <groupId>org.jdom</groupId> > >> > >> Next, I built Hadoop: > >> > >> cmccabe@keter:~/> mvn package -Pdist -DskipTests > -Dmaven.javadoc.skip=true > >> > >> You should get a package with Hadoop jars named like so: > >> > >> ... > >> > >> > ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar > >> > >> > ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar > >> ... > >> > >> This package should also contain an htrace-3.2.0-SNAPSHOT jar. > >> > >> OK, so how can we start seeing some trace spans? The easiest way is to > >> configure LocalFileSpanReceiver. > >> > >> Add this to your hdfs-site.xml: > >> > >> > <property> > >> > <name>hadoop.htrace.spanreceiver.classes</name> > >> > <value>org.apache.htrace.impl.LocalFileSpanReceiver</value> > >> > </property> > >> > <property> > >> > <name>hadoop.htrace.sampler</name> > >> > <value>AlwaysSampler</value> > >> > </property> > >> > >> When you run the Hadoop daemons, you should see them writing to files > named > >> /tmp/${PROCESS_ID} (for each different process). If this doesn't > happen, > >> try > >> cranking up your log4j level to TRACE to see why the SpanReceiver could > not > >> be > >> created. > >> > >> You should see something like this in the log4j logs: > >> > >> > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span > receiver of > >> type org.apache.htrace.impl.LocalFileSpanReceiver > >> > at > >> org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92) > >> > at > >> > >> > org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161) > >> > at > >> > >> > org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147) > >> > at > >> > >> > org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82) > >> > >> Running htraced is easy. You simply run the binary: > >> > >> > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced > >> -Dlog.level=TRACE -Ddata.store.clear > >> > >> You should see messages like this: > >> > >> > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced > >> -Dlog.level=TRACE -Ddata.store.clear > >> > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf > >> > 2015-03-02T19:08:33-08:00 D: data.store.clear = true > >> > 2015-03-02T19:08:33-08:00 D: log.level = TRACE > >> > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory > >> /tmp/htrace1/db > >> > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db: > >> Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is > >> false) > >> > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in > >> /tmp/htrace1/db > >> > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at > >> /tmp/htrace1/db. > >> > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory > >> /tmp/htrace2/db > >> > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db: > >> Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is > >> false) > >> > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in > >> /tmp/htrace2/db > >> > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at > >> /tmp/htrace2/db. > >> ... > >> > >> Similar to Hadoop daemons, htraced can be configured either through an > XML > >> file > >> named htraced-conf.xml (found in a location pointed to by > >> HTRACED_CONF_DIR), or > >> by passing -Dkey=value flags on the command line. > >> > >> Let's check out the htrace command. > >> > >> > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace > >> serverInfo > >> > HTraced server version 3.2.0-incubating-SNAPSHOT > >> (5c0a712c7dd4263f5e2a88d4c61a0facab25953f) > >> > >> "serverInfo" queries the htraced server via REST and get back a > response. > >> For help using the htrace command, we can run: > >> > >> > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help > >> > usage: ./htrace-core/src/go/build/htrace [<flags>] <command> > [<flags>] > >> [<args> ...] > >> > > >> > The Apache HTrace command-line tool. This tool retrieves and modifies > >> settings and other data on a running htraced daemon. > >> > > >> > If we find an htraced-conf.xml configuration file in the list of > >> directories specified in HTRACED_CONF_DIR, we will use that > configuration; > >> otherwise, the defaults will be used. > >> > > >> > Flags: > >> > --help Show help. > >> > --Dmy.key="my.value" > >> > Set configuration key 'my.key' to 'my.value'. Replace > >> 'my.key' with any key you want to set. > >> > --addr=ADDR Server address. > >> > --verbose Verbose. > >> > > >> > Commands: > >> > help [<command>] > >> > Show help for a command. > >> > ... > >> > >> We can load spans into the htraced daemon from a text file using > >> ./build/htraced loadSpans [file-path], and dump the span information > using > >> ./build/htraced dumpAll. > >> > >> Now, at this point, we would like our htraced client (Hadoop) to send > spans > >> directly to htraced, rather than dumping them to a local file. > >> To make this work, we will need to put the htrace-htraced jar on the > hadoop > >> CLASSPATH. > >> > >> There is probably a better way to do it by setting HADOOP_CLASSPATH, but > >> this > >> simple script just puts the jar on every part of the Hadoop CLASSPATH I > >> could > >> think of where it might need to be: > >> > >> > #!/bin/bash > >> > > >> > # Copy the installed version of htrace-core to the correct hadoop jar > >> locations > >> > cat << EOF | xargs -n 1 cp > >> > >> > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar > >> > EOF > >> > > >> > # Copy the installed version of htrace-htraced to the correct hadoop > jar > >> locations > >> > cat << EOF | xargs -n 1 cp > >> > >> > /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > >> > > >> > >> > /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar > >> > EOF > >> > >> At this point, I changed hdfs-site.xml so that > >> hadoop.htrace.spanreceiver.classes was set to the htraced span receiver: > >> > <property> > >> > <name>hadoop.htrace.spanreceiver.classes</name> > >> > <value>org.apache.htrace.impl.HTracedRESTReceiver</value> > >> > </property> > >> > <property> > >> > <name>htraced.rest.url</name> > >> > <value>http://lumbergh.initech.com:9095/</value> > >> > </property> > >> > >> Obviously set the htraced.rest.url to the host on which you are running > >> htraced. > >> This setup should work for sending spans to htraced. > >> > >> To see the web UI, point your web browser at > >> http://lumbergh.initech.com:9095/ (or whatever the host name is for you > >> where htraced is running). > >> > >> I hope this helps some folks out. Hopefully building Hadoop and > massaging > >> the classpath is not too bad. This install process will improve in the > >> future, as more projects get stable releases with HTrace. There has also > >> been some discussion of making docker images, which might help new > >> developers get started. > >> > >> best, > >> Colin > >> > > > > > > -- > > *Lewis* > -- *Lewis*
