Re: Trace HBase/HDFS with HTrace

Colin P. McCabe Thu, 12 Feb 2015 13:46:31 -0800

On Thu, Feb 12, 2015 at 1:23 PM, Chunxu Tang <chunxut...@gmail.com> wrote:
> Hi all,
>
> Thanks for your detailed replies!
>
> Now I have tested end-to-end tracing in two versions of HBase (0.98.10 and
> 0.99.2), combined with Hadoop 2.6.0 and htrace-master (3.0.4), and both of
> them failed. For HBase 0.98.10, it actually has htrace 2.0.4 core, so it's
> normal to get no traces. While, HBase 0.99.2 has htrace 3.0.4 core, but I
> still cannot get traces of HDFS, I can only get traces of HBase.


Hadoop 2.6.0 doesn't the correct version of HTrace, so this is all
expected.  You aren't going to be able to do anything useful here as
long as you keep using Hadoop 2.6.0.  I would suggest using
Hadoop-2.7.0-SNAPSHOT with an appropriate version of HBase.

Hope this helps.

best,
Colin


>
> I think the first thing I need to make sure is that I use a correct method
> to implement end-to-end test. I'm not very sure whether it's good to show
> whole source code on the mailing list, so I just put some core code chunks
> written in the client code here:
>
> public void run(){
>         Configuration conf = HBaseConfiguration.create();
>         org.apache.hadoop.hbase.trace.SpanReceiverHost.getInstance(conf);
>         org.apache.hadoop.tracing.SpanReceiverHost.getInstance(new
> HdfsConfiguration());
>
>         TraceScope ts = Trace.startSpan("Gets", Sampler.ALWAYS);
>         HTable table = new HTable(conf, "t1");
>         Get get = new Get(Bytes.toBytes("r1"));
>         table.get(get);
>         ...
> }
>
> Now I can only get traces of HBase, ending with HfileReaderV2.readBlock()
> function. Is my testing method correct? And because I'm not familiar with
> new version of HTrace and HBase/HDFS with new htrace core, could you give
> me some suggestions to detect where the error may take place?
>
> Thank you all.
>
> Joshua
>
> 2015-02-11 22:08 GMT-05:00 Colin P. McCabe <cmcc...@apache.org>:
>
>> No, I think I'm the one who's missing something. :)
>>
>> I will give that a try next time I'm testing out end-to-end tracing.
>>
>> thanks guys.
>> Colin
>>
>> On Wed, Feb 11, 2015 at 4:36 PM, Enis Söztutar <enis....@gmail.com> wrote:
>> > mvn install just installs it in local cache which you can then use for
>> > building other projects. So no need to have to define a file based local
>> > repo. Am I missing something?
>> >
>> > Enis
>> >
>> > On Wed, Feb 11, 2015 at 12:36 PM, Nick Dimiduk <ndimi...@gmail.com>
>> wrote:
>> >
>> >> Oh, I see. I was assuming a local build of Hadoop snapshot installed
>> into
>> >> the local cache.
>> >>
>> >> On Wednesday, February 11, 2015, Colin P. McCabe <cmcc...@apache.org>
>> >> wrote:
>> >>
>> >> > On Wed, Feb 11, 2015 at 11:27 AM, Nick Dimiduk <ndimi...@gmail.com
>> >> > <javascript:;>> wrote:
>> >> > > I don't recall the hadoop release repo restriction being a problem,
>> >> but I
>> >> > > haven't tested it lately. See if you can just specify the release
>> >> version
>> >> > > with -Dhadoop.version or -Dhadoop-two.version.
>> >> > >
>> >> >
>> >> > Sorry, it's been a while since I did this... I guess the question is
>> >> > whether 2.7.0-SNAPSHOT is available in Maven-land somewhere?  If so,
>> >> > then Chunxu should forget all that stuff I said, and just build HBase
>> >> > with -Dhadoop.version=2.7.0-SNAPSHOT
>> >> >
>> >> > > I would go against branch-1.0 as this will be the eminent 1.0.0
>> release
>> >> > and
>> >> > > had HTrace 3.1.0-incubating.
>> >> >
>> >> > Thanks.
>> >> >
>> >> > Colin
>> >> >
>> >> >
>> >> > >
>> >> > > -n
>> >> > >
>> >> > > On Wed, Feb 11, 2015 at 11:13 AM, Colin P. McCabe <
>> cmcc...@apache.org
>> >> > <javascript:;>>
>> >> > > wrote:
>> >> > >
>> >> > >> Thanks for trying stuff out!  Sorry that this is a little
>> difficult at
>> >> > >> the moment.
>> >> > >>
>> >> > >> To really do this right, you would want to be using Hadoop with
>> HTrace
>> >> > >> 3.1.0, and HBase with HTrace 3.1.0.  Unfortunately, there hasn't
>> been
>> >> > >> a new release of Hadoop with HTrace 3.1.0.  The only existing
>> releases
>> >> > >> of Hadoop use an older version of the HTrace library.  So you will
>> >> > >> have to build from source.
>> >> > >>
>> >> > >> If you check out Hadoop's "branch-2" branch (currently, this branch
>> >> > >> represents what will be in the 2.7 release, when it is cut), and
>> build
>> >> > >> that, you will get the latest.  Then you have to build a version of
>> >> > >> HBase against the version of Hadoop you have built.
>> >> > >>
>> >> > >> By default, HBase's Maven build will build against upstream release
>> >> > >> versions of Hadoop only. So just setting
>> >> > >> -Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know
>> >> > >> where to find the jars.  To get around this problem, you can create
>> >> > >> your own local maven repo. Here's how.
>> >> > >>
>> >> > >> In hadoop/pom.xml, add these lines to the distributionManagement
>> >> stanza:
>> >> > >>
>> >> > >> +    <repository>
>> >> > >> +      <id>localdump</id>
>> >> > >> +      <url>file:///home/cmccabe/localdump/releases</url>
>> >> > >> +    </repository>
>> >> > >> +    <snapshotRepository>
>> >> > >> +      <id>localdump</id>
>> >> > >> +      <url>file:///home/cmccabe/localdump/snapshots</url>
>> >> > >> +    </snapshotRepository>
>> >> > >>
>> >> > >> Comment out the repositories that are already there.
>> >> > >>
>> >> > >> Now run mkdir /home/cmccabe/localdump.
>> >> > >>
>> >> > >> Then, in your hadoop tree, run mvn deploy -DskipTests.
>> >> > >>
>> >> > >> You should get a localdump directory that has files kind of like
>> this:
>> >> > >>
>> >> > >> ...
>> >> > >> /home/cmccabe/localdump/snapshots/org/apache/hadoop
>> >> > >>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/maven-metadata.xml.md5
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml.md5
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/hadoop-mapreduce-2.7.0-20121120.230341-1.pom.sha1
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml
>> >> > >> ...
>> >> > >>
>> >> > >> Now, add the following lines to your HBase pom.xml:
>> >> > >>
>> >> > >>    <repositories>
>> >> > >>      <repository>
>> >> > >> +      <id>localdump</id>
>> >> > >> +      <url>file:///home/cmccabe/localdump</url>
>> >> > >> +      <name>Local Dump</name>
>> >> > >> +      <snapshots>
>> >> > >> +        <enabled>true</enabled>
>> >> > >> +      </snapshots>
>> >> > >> +      <releases>
>> >> > >> +        <enabled>true</enabled>
>> >> > >> +      </releases>
>> >> > >> +    </repository>
>> >> > >> +    <repository>
>> >> > >>
>> >> > >> This will allow you to run something like:
>> >> > >> mvn test -Dtest=TestMiniClusterLoadSequential -PlocalTests
>> >> > >> -DredirectTestOutputToFile=true -Dhadoop.profile=2.0
>> >> > >> -Dhadoop.version=2.7.0-SNAPSHOT -Dcdh.hadoop.version=2.7.0-SNAPSHOT
>> >> > >>
>> >> > >> Once we do a new release of Hadoop with HTrace 3.1.0 this will get
>> a
>> >> lot
>> >> > >> easier.
>> >> > >>
>> >> > >> Related: Does anyone know what the best git branch to build from
>> for
>> >> > >> HBase would be for this kind of testing?  I've been meaning to do
>> some
>> >> > >> end to end testing (it's been on my TODO for a while)
>> >> > >>
>> >> > >> best,
>> >> > >> Colin
>> >> > >>
>> >> > >> On Wed, Feb 11, 2015 at 7:55 AM, Chunxu Tang <chunxut...@gmail.com
>> >> > <javascript:;>> wrote:
>> >> > >> > Hi all,
>> >> > >> >
>> >> > >> > Now I’m exploiting HTrace to trace request level data flows in
>> HBase
>> >> > and
>> >> > >> > HDFS. I have successfully traced HBase and HDFS by using HTrace,
>> >> > >> > respectively.
>> >> > >> >
>> >> > >> > After that, I combine HBase and HDFS together and I want to just
>> >> send
>> >> > a
>> >> > >> > PUT/GET request to HBase, but to trace the whole data flow in
>> both
>> >> > HBase
>> >> > >> > and HDFS. In my opinion, when I send a request such as Get to
>> HBase,
>> >> > it
>> >> > >> > will at last try to read the blocks on HDFS, so I can construct a
>> >> > whole
>> >> > >> > data flow tracing through HBase and HDFS. While, the fact is
>> that I
>> >> > can
>> >> > >> > only get tracing data of HBase, with no data of HDFS.
>> >> > >> >
>> >> > >> > Could you give me any suggestions on how to trace the data flow
>> in
>> >> > both
>> >> > >> > HBase and HDFS? Does anyone have similar experience? Do I need to
>> >> > modify
>> >> > >> > the source code? And maybe which part(s) should I touch? If I
>> need
>> >> to
>> >> > >> > modify the code, I will try to create a patch for that.
>> >> > >> >
>> >> > >> > Thank you.
>> >> > >> >
>> >> > >> > My Configurations:
>> >> > >> > Hadoop version: 2.6.0
>> >> > >> > HBase version: 0.99.2
>> >> > >> > HTrace version: htrace-master
>> >> > >> > OS: Ubuntu 12.04
>> >> > >> >
>> >> > >> >
>> >> > >> > Joshua
>> >> > >>
>> >> >
>> >>
>>

Re: Trace HBase/HDFS with HTrace

Reply via email to