Re: Trace HBase/HDFS with HTrace

Colin P. McCabe Wed, 11 Feb 2015 12:10:24 -0800

On Wed, Feb 11, 2015 at 11:27 AM, Nick Dimiduk <ndimi...@gmail.com> wrote:
> I don't recall the hadoop release repo restriction being a problem, but I
> haven't tested it lately. See if you can just specify the release version
> with -Dhadoop.version or -Dhadoop-two.version.
>


Sorry, it's been a while since I did this... I guess the question is
whether 2.7.0-SNAPSHOT is available in Maven-land somewhere?  If so,
then Chunxu should forget all that stuff I said, and just build HBase
with -Dhadoop.version=2.7.0-SNAPSHOT

> I would go against branch-1.0 as this will be the eminent 1.0.0 release and
> had HTrace 3.1.0-incubating.

Thanks.

Colin


>
> -n
>
> On Wed, Feb 11, 2015 at 11:13 AM, Colin P. McCabe <cmcc...@apache.org>
> wrote:
>
>> Thanks for trying stuff out!  Sorry that this is a little difficult at
>> the moment.
>>
>> To really do this right, you would want to be using Hadoop with HTrace
>> 3.1.0, and HBase with HTrace 3.1.0.  Unfortunately, there hasn't been
>> a new release of Hadoop with HTrace 3.1.0.  The only existing releases
>> of Hadoop use an older version of the HTrace library.  So you will
>> have to build from source.
>>
>> If you check out Hadoop's "branch-2" branch (currently, this branch
>> represents what will be in the 2.7 release, when it is cut), and build
>> that, you will get the latest.  Then you have to build a version of
>> HBase against the version of Hadoop you have built.
>>
>> By default, HBase's Maven build will build against upstream release
>> versions of Hadoop only. So just setting
>> -Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know
>> where to find the jars.  To get around this problem, you can create
>> your own local maven repo. Here's how.
>>
>> In hadoop/pom.xml, add these lines to the distributionManagement stanza:
>>
>> +    <repository>
>> +      <id>localdump</id>
>> +      <url>file:///home/cmccabe/localdump/releases</url>
>> +    </repository>
>> +    <snapshotRepository>
>> +      <id>localdump</id>
>> +      <url>file:///home/cmccabe/localdump/snapshots</url>
>> +    </snapshotRepository>
>>
>> Comment out the repositories that are already there.
>>
>> Now run mkdir /home/cmccabe/localdump.
>>
>> Then, in your hadoop tree, run mvn deploy -DskipTests.
>>
>> You should get a localdump directory that has files kind of like this:
>>
>> ...
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce
>>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/maven-metadata.xml.md5
>>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT
>>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml.md5
>>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/hadoop-mapreduce-2.7.0-20121120.230341-1.pom.sha1
>>
>> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml
>> ...
>>
>> Now, add the following lines to your HBase pom.xml:
>>
>>    <repositories>
>>      <repository>
>> +      <id>localdump</id>
>> +      <url>file:///home/cmccabe/localdump</url>
>> +      <name>Local Dump</name>
>> +      <snapshots>
>> +        <enabled>true</enabled>
>> +      </snapshots>
>> +      <releases>
>> +        <enabled>true</enabled>
>> +      </releases>
>> +    </repository>
>> +    <repository>
>>
>> This will allow you to run something like:
>> mvn test -Dtest=TestMiniClusterLoadSequential -PlocalTests
>> -DredirectTestOutputToFile=true -Dhadoop.profile=2.0
>> -Dhadoop.version=2.7.0-SNAPSHOT -Dcdh.hadoop.version=2.7.0-SNAPSHOT
>>
>> Once we do a new release of Hadoop with HTrace 3.1.0 this will get a lot
>> easier.
>>
>> Related: Does anyone know what the best git branch to build from for
>> HBase would be for this kind of testing?  I've been meaning to do some
>> end to end testing (it's been on my TODO for a while)
>>
>> best,
>> Colin
>>
>> On Wed, Feb 11, 2015 at 7:55 AM, Chunxu Tang <chunxut...@gmail.com> wrote:
>> > Hi all,
>> >
>> > Now I’m exploiting HTrace to trace request level data flows in HBase and
>> > HDFS. I have successfully traced HBase and HDFS by using HTrace,
>> > respectively.
>> >
>> > After that, I combine HBase and HDFS together and I want to just send a
>> > PUT/GET request to HBase, but to trace the whole data flow in both HBase
>> > and HDFS. In my opinion, when I send a request such as Get to HBase, it
>> > will at last try to read the blocks on HDFS, so I can construct a whole
>> > data flow tracing through HBase and HDFS. While, the fact is that I can
>> > only get tracing data of HBase, with no data of HDFS.
>> >
>> > Could you give me any suggestions on how to trace the data flow in both
>> > HBase and HDFS? Does anyone have similar experience? Do I need to modify
>> > the source code? And maybe which part(s) should I touch? If I need to
>> > modify the code, I will try to create a patch for that.
>> >
>> > Thank you.
>> >
>> > My Configurations:
>> > Hadoop version: 2.6.0
>> > HBase version: 0.99.2
>> > HTrace version: htrace-master
>> > OS: Ubuntu 12.04
>> >
>> >
>> > Joshua
>>

Re: Trace HBase/HDFS with HTrace

Reply via email to