Re: Getting started with Apache HTrace development

Lewis John Mcgibbney Mon, 02 Mar 2015 20:53:34 -0800

This is dynamite and I think it would be very helpful to have it linked to
from the website.
Although the install and config doesn't appear too bulky, there are a
number of steps and this would be non trivial for someone who is not
familiarized with Hadoop xml based runtime configuration.
I'm finishing off a patch for Chukwa right now then I will be building
HTtace into my Nutxh 2.x search stack. My aim is to write something similar
for that deployment as R would also be very helpful to see tracing for Gora
data stores as well.


On Monday, March 2, 2015, Colin P. McCabe <[email protected]> wrote:

> A few people have asked how to get started with HTrace development.  It's a
> good question and we don't have a great README up about it so I thought I
> would
> write something.
>
> HTrace is all about tracing distributed systems.  So the best way to get
> started is to plug htrace into your favorite distributed system and see
> what
> cool things happen or what bugs pop up.  Since I'm an HDFS developer,
> that's
> the distributed system that I'm most familiar with.  So I will do a quick
> writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
> important use-case that I would like to write about later, but one step at
> a
> time.)
>
> Just a quick note: a lot of this software is relatively new.  So there may
> be
> bugs or integration pain points that you encounter.
>
> There has not yet been a stable release of Hadoop that contained Apache
> HTrace.
> There have been releases that contained the pre-Apache version of HTrace,
> but
> that's no fun.  If we want to do development, we want to be able to run the
> latest version of the code.  So we will have to build it ourselves.
>
> Building HTrace is not too bad.  First we install the dependencies:
>
> cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel
>
> If you have a different Linux distro this command will vary slightly, of
> course.  On Macs, "brew" is a good option.
> Next we use Maven to build the source:
>
>  > cmccabe@keter:~/> git clone
> https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
>  > cmccabe@keter:~/> cd incubator-htrace
>  > cmccabe@keter:~/> git checkout master
>  > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true
> -Drat.skip
>
> OK.  So htrace is built and installed to the local ~/.m2 directory.
>
> We should see it under the .m2:
> cmccabe@keter:~/> find ~/.m2 | grep htrace-core
> ...
>  >
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
>  >
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
>  >
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
> ...
>
> The version you built should be 3.2.0-SNAPSHOT.
>
> Next, we check out Hadoop:
>
>  > cmccabe@keter:~/> git clone
> https://git-wip-us.apache.org/repos/asf/hadoop.git
>  > cmccabe@keter:~/> cd hadoop
>  > cmccabe@keter:~/> git checkout branch-2
>
> So we are basically building a pre-release version of Hadoop 2.7, currently
> known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
> rather
> than the stable 3.1.0 release which it would ordinarily use in branch-2.  I
> applied this diff to hadoop-project/pom.xml
>
>  > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
>  > index 569b292..5b7e466 100644
>  > --- a/hadoop-project/pom.xml
>  > +++ b/hadoop-project/pom.xml
>  > @@ -785,7 +785,7 @@
>  >        <dependency>
>  >          <groupId>org.apache.htrace</groupId>
>  >          <artifactId>htrace-core</artifactId>
>  > -        <version>3.1.0-incubating</version>
>  > +        <version>3.2.0-incubating-SNAPSHOT</version>
>  >        </dependency>
>  >        <dependency>
>  >          <groupId>org.jdom</groupId>
>
> Next, I built Hadoop:
>
> cmccabe@keter:~/> mvn package -Pdist -DskipTests -Dmaven.javadoc.skip=true
>
> You should get a package with Hadoop jars named like so:
>
> ...
>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
> ...
>
> This package should also contain an htrace-3.2.0-SNAPSHOT jar.
>
> OK, so how can we start seeing some trace spans?  The easiest way is to
> configure LocalFileSpanReceiver.
>
> Add this to your hdfs-site.xml:
>
>  > <property>
>  >   <name>hadoop.htrace.spanreceiver.classes</name>
>  >   <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
>  > </property>
>  > <property>
>  >   <name>hadoop.htrace.sampler</name>
>  >   <value>AlwaysSampler</value>
>  > </property>
>
> When you run the Hadoop daemons, you should see them writing to files named
> /tmp/${PROCESS_ID} (for each different process).  If this doesn't happen,
> try
> cranking up your log4j level to TRACE to see why the SpanReceiver could not
> be
> created.
>
> You should see something like this in the log4j logs:
>
>  > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span receiver of
> type org.apache.htrace.impl.LocalFileSpanReceiver
>  >        at
> org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
>  >        at
>
> org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
>  >        at
>
> org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
>  >        at
>
> org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)
>
> Running htraced is easy.  You simply run the binary:
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> -Dlog.level=TRACE -Ddata.store.clear
>
> You should see messages like this:
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> -Dlog.level=TRACE -Ddata.store.clear
>  > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf
>  > 2015-03-02T19:08:33-08:00 D: data.store.clear = true
>  > 2015-03-02T19:08:33-08:00 D: log.level = TRACE
>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> /tmp/htrace1/db
>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db:
> Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is
> false)
>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> /tmp/htrace1/db
>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> /tmp/htrace1/db.
>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> /tmp/htrace2/db
>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db:
> Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is
> false)
>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> /tmp/htrace2/db
>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> /tmp/htrace2/db.
> ...
>
> Similar to Hadoop daemons, htraced can be configured either through an XML
> file
> named htraced-conf.xml (found in a location pointed to by
> HTRACED_CONF_DIR), or
> by passing -Dkey=value flags on the command line.
>
> Let's check out the htrace command.
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace
> serverInfo
>  > HTraced server version 3.2.0-incubating-SNAPSHOT
> (5c0a712c7dd4263f5e2a88d4c61a0facab25953f)
>
> "serverInfo" queries the htraced server via REST and get back a response.
> For help using the htrace command, we can run:
>
>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help
>  > usage: ./htrace-core/src/go/build/htrace [<flags>] <command> [<flags>]
> [<args> ...]
>  >
>  > The Apache HTrace command-line tool. This tool retrieves and modifies
> settings and other data on a running htraced daemon.
>  >
>  > If we find an htraced-conf.xml configuration file in the list of
> directories specified in HTRACED_CONF_DIR, we will use that configuration;
> otherwise, the defaults will be used.
>  >
>  > Flags:
>  >   --help       Show help.
>  >   --Dmy.key="my.value"
>  >                Set configuration key 'my.key' to 'my.value'. Replace
> 'my.key' with any key you want to set.
>  >   --addr=ADDR  Server address.
>  >   --verbose    Verbose.
>  >
>  > Commands:
>  >   help [<command>]
>  >     Show help for a command.
>  > ...
>
> We can load spans into the htraced daemon from a text file using
> ./build/htraced loadSpans [file-path], and dump the span information using
> ./build/htraced dumpAll.
>
> Now, at this point, we would like our htraced client (Hadoop) to send spans
> directly to htraced, rather than dumping them to a local file.
> To make this work, we will need to put the htrace-htraced jar on the hadoop
> CLASSPATH.
>
> There is probably a better way to do it by setting HADOOP_CLASSPATH, but
> this
> simple script just puts the jar on every part of the Hadoop CLASSPATH I
> could
> think of where it might need to be:
>
>  > #!/bin/bash
>  >
>  > # Copy the installed version of htrace-core to the correct hadoop jar
> locations
>  > cat << EOF | xargs -n 1 cp
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
>  > EOF
>  >
>  > # Copy the installed version of htrace-htraced to the correct hadoop jar
> locations
>  > cat << EOF | xargs -n 1 cp
>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  >
>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
>  > EOF
>
> At this point, I changed hdfs-site.xml so that
> hadoop.htrace.spanreceiver.classes was set to the htraced span receiver:
>  > <property>
>  >   <name>hadoop.htrace.spanreceiver.classes</name>
>  >   <value>org.apache.htrace.impl.HTracedRESTReceiver</value>
>  > </property>
>  > <property>
>  >   <name>htraced.rest.url</name>
>  >   <value>http://lumbergh.initech.com:9095/</value>
>  > </property>
>
> Obviously set the htraced.rest.url to the host on which you are running
> htraced.
> This setup should work for sending spans to htraced.
>
> To see the web UI, point your web browser at
> http://lumbergh.initech.com:9095/ (or whatever the host name is for you
> where htraced is running).
>
> I hope this helps some folks out.  Hopefully building Hadoop and massaging
> the classpath is not too bad.  This install process will improve in the
> future, as more projects get stable releases with HTrace. There has also
> been some discussion of making docker images, which might help new
> developers get started.
>
> best,
> Colin
>


-- 
*Lewis*

Re: Getting started with Apache HTrace development

Reply via email to