Re: Getting started with Apache HTrace development

Lewis John Mcgibbney Thu, 05 Mar 2015 17:42:07 -0800

Is the website on CMS?

On Thu, Mar 5, 2015 at 5:18 PM, Colin P. McCabe <[email protected]> wrote:


> Can we set up a wiki?  Stuff like this needs to be updated
> periodically and it would be nice to have something like the hadoop
> wiki.  Of course there may be some out of date stuff from time to
> time, but it's better than nothing...
>
> On Mon, Mar 2, 2015 at 8:52 PM, Lewis John Mcgibbney
> <[email protected]> wrote:
> > This is dynamite and I think it would be very helpful to have it linked
> to
> > from the website.
> > Although the install and config doesn't appear too bulky, there are a
> > number of steps and this would be non trivial for someone who is not
> > familiarized with Hadoop xml based runtime configuration.
> > I'm finishing off a patch for Chukwa right now then I will be building
> > HTtace into my Nutxh 2.x search stack. My aim is to write something
> similar
> > for that deployment as R would also be very helpful to see tracing for
> Gora
> > data stores as well.
>
> Awesome.
>
> best,
> Colin
>
> >
> > On Monday, March 2, 2015, Colin P. McCabe <[email protected]> wrote:
> >
> >> A few people have asked how to get started with HTrace development.
> It's a
> >> good question and we don't have a great README up about it so I thought
> I
> >> would
> >> write something.
> >>
> >> HTrace is all about tracing distributed systems.  So the best way to get
> >> started is to plug htrace into your favorite distributed system and see
> >> what
> >> cool things happen or what bugs pop up.  Since I'm an HDFS developer,
> >> that's
> >> the distributed system that I'm most familiar with.  So I will do a
> quick
> >> writeup about how to use HTrace + HDFS.  (HBase + HTrace is another very
> >> important use-case that I would like to write about later, but one step
> at
> >> a
> >> time.)
> >>
> >> Just a quick note: a lot of this software is relatively new.  So there
> may
> >> be
> >> bugs or integration pain points that you encounter.
> >>
> >> There has not yet been a stable release of Hadoop that contained Apache
> >> HTrace.
> >> There have been releases that contained the pre-Apache version of
> HTrace,
> >> but
> >> that's no fun.  If we want to do development, we want to be able to run
> the
> >> latest version of the code.  So we will have to build it ourselves.
> >>
> >> Building HTrace is not too bad.  First we install the dependencies:
> >>
> >> cmccabe@keter:~/> apt-get install java javac google-go leveldb-devel
> >>
> >> If you have a different Linux distro this command will vary slightly, of
> >> course.  On Macs, "brew" is a good option.
> >> Next we use Maven to build the source:
> >>
> >>  > cmccabe@keter:~/> git clone
> >> https://git-wip-us.apache.org/repos/asf/incubator-htrace.git
> >>  > cmccabe@keter:~/> cd incubator-htrace
> >>  > cmccabe@keter:~/> git checkout master
> >>  > cmccabe@keter:~/> mvn install -DskipTests -Dmaven.javadoc.skip=true
> >> -Drat.skip
> >>
> >> OK.  So htrace is built and installed to the local ~/.m2 directory.
> >>
> >> We should see it under the .m2:
> >> cmccabe@keter:~/> find ~/.m2 | grep htrace-core
> >> ...
> >>  >
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT
> >>  >
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.jar.lastUpdated
> >>  >
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-SNAPSHOT/htrace-core-3.2.0-SNAPSHOT.pom.lastUpdated
> >> ...
> >>
> >> The version you built should be 3.2.0-SNAPSHOT.
> >>
> >> Next, we check out Hadoop:
> >>
> >>  > cmccabe@keter:~/> git clone
> >> https://git-wip-us.apache.org/repos/asf/hadoop.git
> >>  > cmccabe@keter:~/> cd hadoop
> >>  > cmccabe@keter:~/> git checkout branch-2
> >>
> >> So we are basically building a pre-release version of Hadoop 2.7,
> currently
> >> known as branch-2.  We will need to modify Hadoop to use 3.2.0-SNAPSHOT
> >> rather
> >> than the stable 3.1.0 release which it would ordinarily use in
> branch-2.  I
> >> applied this diff to hadoop-project/pom.xml
> >>
> >>  > diff --git a/hadoop-project/pom.xml b/hadoop-project/pom.xml
> >>  > index 569b292..5b7e466 100644
> >>  > --- a/hadoop-project/pom.xml
> >>  > +++ b/hadoop-project/pom.xml
> >>  > @@ -785,7 +785,7 @@
> >>  >        <dependency>
> >>  >          <groupId>org.apache.htrace</groupId>
> >>  >          <artifactId>htrace-core</artifactId>
> >>  > -        <version>3.1.0-incubating</version>
> >>  > +        <version>3.2.0-incubating-SNAPSHOT</version>
> >>  >        </dependency>
> >>  >        <dependency>
> >>  >          <groupId>org.jdom</groupId>
> >>
> >> Next, I built Hadoop:
> >>
> >> cmccabe@keter:~/> mvn package -Pdist -DskipTests
> -Dmaven.javadoc.skip=true
> >>
> >> You should get a package with Hadoop jars named like so:
> >>
> >> ...
> >>
> >>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-codec-1.4.jar
> >>
> >>
> ./hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-2.7.0-SNAPSHOT/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar
> >> ...
> >>
> >> This package should also contain an htrace-3.2.0-SNAPSHOT jar.
> >>
> >> OK, so how can we start seeing some trace spans?  The easiest way is to
> >> configure LocalFileSpanReceiver.
> >>
> >> Add this to your hdfs-site.xml:
> >>
> >>  > <property>
> >>  >   <name>hadoop.htrace.spanreceiver.classes</name>
> >>  >   <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
> >>  > </property>
> >>  > <property>
> >>  >   <name>hadoop.htrace.sampler</name>
> >>  >   <value>AlwaysSampler</value>
> >>  > </property>
> >>
> >> When you run the Hadoop daemons, you should see them writing to files
> named
> >> /tmp/${PROCESS_ID} (for each different process).  If this doesn't
> happen,
> >> try
> >> cranking up your log4j level to TRACE to see why the SpanReceiver could
> not
> >> be
> >> created.
> >>
> >> You should see something like this in the log4j logs:
> >>
> >>  > 13:28:33,885 TRACE SpanReceiverBuilder:94 - Created new span
> receiver of
> >> type org.apache.htrace.impl.LocalFileSpanReceiver
> >>  >        at
> >> org.apache.htrace.SpanReceiverBuilder.build(SpanReceiverBuilder.java:92)
> >>  >        at
> >>
> >>
> org.apache.hadoop.tracing.SpanReceiverHost.loadInstance(SpanReceiverHost.java:161)
> >>  >        at
> >>
> >>
> org.apache.hadoop.tracing.SpanReceiverHost.loadSpanReceivers(SpanReceiverHost.java:147)
> >>  >        at
> >>
> >>
> org.apache.hadoop.tracing.SpanReceiverHost.getInstance(SpanReceiverHost.java:82)
> >>
> >> Running htraced is easy.  You simply run the binary:
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> >> -Dlog.level=TRACE -Ddata.store.clear
> >>
> >> You should see messages like this:
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htraced
> >> -Dlog.level=TRACE -Ddata.store.clear
> >>  > 2015-03-02T19:08:33-08:00 D: HTRACED_CONF_DIR=/home/cmccabe/conf
> >>  > 2015-03-02T19:08:33-08:00 D: data.store.clear = true
> >>  > 2015-03-02T19:08:33-08:00 D: log.level = TRACE
> >>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> >> /tmp/htrace1/db
> >>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace1/db:
> >> Invalid argument: /tmp/htrace1/db: does not exist (create_if_missing is
> >> false)
> >>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> >> /tmp/htrace1/db
> >>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> >> /tmp/htrace1/db.
> >>  > 2015-03-02T19:08:33-08:00 I: Cleared existing datastore directory
> >> /tmp/htrace2/db
> >>  > 2015-03-02T19:08:33-08:00 D: LevelDB failed to open /tmp/htrace2/db:
> >> Invalid argument: /tmp/htrace2/db: does not exist (create_if_missing is
> >> false)
> >>  > 2015-03-02T19:08:33-08:00 I: Created new LevelDB instance in
> >> /tmp/htrace2/db
> >>  > 2015-03-02T19:08:33-08:00 T: Wrote layout version 2 to shard at
> >> /tmp/htrace2/db.
> >> ...
> >>
> >> Similar to Hadoop daemons, htraced can be configured either through an
> XML
> >> file
> >> named htraced-conf.xml (found in a location pointed to by
> >> HTRACED_CONF_DIR), or
> >> by passing -Dkey=value flags on the command line.
> >>
> >> Let's check out the htrace command.
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace
> >> serverInfo
> >>  > HTraced server version 3.2.0-incubating-SNAPSHOT
> >> (5c0a712c7dd4263f5e2a88d4c61a0facab25953f)
> >>
> >> "serverInfo" queries the htraced server via REST and get back a
> response.
> >> For help using the htrace command, we can run:
> >>
> >>  > cmccabe@keter:~/src/htrace> ./htrace-core/src/go/build/htrace --help
> >>  > usage: ./htrace-core/src/go/build/htrace [<flags>] <command>
> [<flags>]
> >> [<args> ...]
> >>  >
> >>  > The Apache HTrace command-line tool. This tool retrieves and modifies
> >> settings and other data on a running htraced daemon.
> >>  >
> >>  > If we find an htraced-conf.xml configuration file in the list of
> >> directories specified in HTRACED_CONF_DIR, we will use that
> configuration;
> >> otherwise, the defaults will be used.
> >>  >
> >>  > Flags:
> >>  >   --help       Show help.
> >>  >   --Dmy.key="my.value"
> >>  >                Set configuration key 'my.key' to 'my.value'. Replace
> >> 'my.key' with any key you want to set.
> >>  >   --addr=ADDR  Server address.
> >>  >   --verbose    Verbose.
> >>  >
> >>  > Commands:
> >>  >   help [<command>]
> >>  >     Show help for a command.
> >>  > ...
> >>
> >> We can load spans into the htraced daemon from a text file using
> >> ./build/htraced loadSpans [file-path], and dump the span information
> using
> >> ./build/htraced dumpAll.
> >>
> >> Now, at this point, we would like our htraced client (Hadoop) to send
> spans
> >> directly to htraced, rather than dumping them to a local file.
> >> To make this work, we will need to put the htrace-htraced jar on the
> hadoop
> >> CLASSPATH.
> >>
> >> There is probably a better way to do it by setting HADOOP_CLASSPATH, but
> >> this
> >> simple script just puts the jar on every part of the Hadoop CLASSPATH I
> >> could
> >> think of where it might need to be:
> >>
> >>  > #!/bin/bash
> >>  >
> >>  > # Copy the installed version of htrace-core to the correct hadoop jar
> >> locations
> >>  > cat << EOF | xargs -n 1 cp
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-core/3.2.0-incubating-SNAPSHOT/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/htrace-core-3.2.0-incubating-SNAPSHOT.jar
> >>  > EOF
> >>  >
> >>  > # Copy the installed version of htrace-htraced to the correct hadoop
> jar
> >> locations
> >>  > cat << EOF | xargs -n 1 cp
> >>
> >>
> /home/cmccabe/.m2/repository/org/apache/htrace/htrace-htraced/3.2.0-incubating-SNAPSHOT/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/hdfs/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/tools/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  >
> >>
> >>
> /home/cmccabe/hadoop-install/share/hadoop/common/lib/htrace-htraced-3.2.0-incubating-SNAPSHOT.jar
> >>  > EOF
> >>
> >> At this point, I changed hdfs-site.xml so that
> >> hadoop.htrace.spanreceiver.classes was set to the htraced span receiver:
> >>  > <property>
> >>  >   <name>hadoop.htrace.spanreceiver.classes</name>
> >>  >   <value>org.apache.htrace.impl.HTracedRESTReceiver</value>
> >>  > </property>
> >>  > <property>
> >>  >   <name>htraced.rest.url</name>
> >>  >   <value>http://lumbergh.initech.com:9095/</value>
> >>  > </property>
> >>
> >> Obviously set the htraced.rest.url to the host on which you are running
> >> htraced.
> >> This setup should work for sending spans to htraced.
> >>
> >> To see the web UI, point your web browser at
> >> http://lumbergh.initech.com:9095/ (or whatever the host name is for you
> >> where htraced is running).
> >>
> >> I hope this helps some folks out.  Hopefully building Hadoop and
> massaging
> >> the classpath is not too bad.  This install process will improve in the
> >> future, as more projects get stable releases with HTrace. There has also
> >> been some discussion of making docker images, which might help new
> >> developers get started.
> >>
> >> best,
> >> Colin
> >>
> >
> >
> > --
> > *Lewis*
>



-- 
*Lewis*

Re: Getting started with Apache HTrace development

Reply via email to