On Wed, Feb 11, 2015 at 11:27 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: > I don't recall the hadoop release repo restriction being a problem, but I > haven't tested it lately. See if you can just specify the release version > with -Dhadoop.version or -Dhadoop-two.version. >
Sorry, it's been a while since I did this... I guess the question is whether 2.7.0-SNAPSHOT is available in Maven-land somewhere? If so, then Chunxu should forget all that stuff I said, and just build HBase with -Dhadoop.version=2.7.0-SNAPSHOT > I would go against branch-1.0 as this will be the eminent 1.0.0 release and > had HTrace 3.1.0-incubating. Thanks. Colin > > -n > > On Wed, Feb 11, 2015 at 11:13 AM, Colin P. McCabe <cmcc...@apache.org> > wrote: > >> Thanks for trying stuff out! Sorry that this is a little difficult at >> the moment. >> >> To really do this right, you would want to be using Hadoop with HTrace >> 3.1.0, and HBase with HTrace 3.1.0. Unfortunately, there hasn't been >> a new release of Hadoop with HTrace 3.1.0. The only existing releases >> of Hadoop use an older version of the HTrace library. So you will >> have to build from source. >> >> If you check out Hadoop's "branch-2" branch (currently, this branch >> represents what will be in the 2.7 release, when it is cut), and build >> that, you will get the latest. Then you have to build a version of >> HBase against the version of Hadoop you have built. >> >> By default, HBase's Maven build will build against upstream release >> versions of Hadoop only. So just setting >> -Dhadoop.version=2.7.0-SNAPSHOT is not enough, since it won't know >> where to find the jars. To get around this problem, you can create >> your own local maven repo. Here's how. >> >> In hadoop/pom.xml, add these lines to the distributionManagement stanza: >> >> + <repository> >> + <id>localdump</id> >> + <url>file:///home/cmccabe/localdump/releases</url> >> + </repository> >> + <snapshotRepository> >> + <id>localdump</id> >> + <url>file:///home/cmccabe/localdump/snapshots</url> >> + </snapshotRepository> >> >> Comment out the repositories that are already there. >> >> Now run mkdir /home/cmccabe/localdump. >> >> Then, in your hadoop tree, run mvn deploy -DskipTests. >> >> You should get a localdump directory that has files kind of like this: >> >> ... >> /home/cmccabe/localdump/snapshots/org/apache/hadoop >> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce >> >> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/maven-metadata.xml.md5 >> >> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT >> >> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml.md5 >> >> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/hadoop-mapreduce-2.7.0-20121120.230341-1.pom.sha1 >> >> /home/cmccabe/localdump/snapshots/org/apache/hadoop/hadoop-mapreduce/2.7.0-SNAPSHOT/maven-metadata.xml >> ... >> >> Now, add the following lines to your HBase pom.xml: >> >> <repositories> >> <repository> >> + <id>localdump</id> >> + <url>file:///home/cmccabe/localdump</url> >> + <name>Local Dump</name> >> + <snapshots> >> + <enabled>true</enabled> >> + </snapshots> >> + <releases> >> + <enabled>true</enabled> >> + </releases> >> + </repository> >> + <repository> >> >> This will allow you to run something like: >> mvn test -Dtest=TestMiniClusterLoadSequential -PlocalTests >> -DredirectTestOutputToFile=true -Dhadoop.profile=2.0 >> -Dhadoop.version=2.7.0-SNAPSHOT -Dcdh.hadoop.version=2.7.0-SNAPSHOT >> >> Once we do a new release of Hadoop with HTrace 3.1.0 this will get a lot >> easier. >> >> Related: Does anyone know what the best git branch to build from for >> HBase would be for this kind of testing? I've been meaning to do some >> end to end testing (it's been on my TODO for a while) >> >> best, >> Colin >> >> On Wed, Feb 11, 2015 at 7:55 AM, Chunxu Tang <chunxut...@gmail.com> wrote: >> > Hi all, >> > >> > Now I’m exploiting HTrace to trace request level data flows in HBase and >> > HDFS. I have successfully traced HBase and HDFS by using HTrace, >> > respectively. >> > >> > After that, I combine HBase and HDFS together and I want to just send a >> > PUT/GET request to HBase, but to trace the whole data flow in both HBase >> > and HDFS. In my opinion, when I send a request such as Get to HBase, it >> > will at last try to read the blocks on HDFS, so I can construct a whole >> > data flow tracing through HBase and HDFS. While, the fact is that I can >> > only get tracing data of HBase, with no data of HDFS. >> > >> > Could you give me any suggestions on how to trace the data flow in both >> > HBase and HDFS? Does anyone have similar experience? Do I need to modify >> > the source code? And maybe which part(s) should I touch? If I need to >> > modify the code, I will try to create a patch for that. >> > >> > Thank you. >> > >> > My Configurations: >> > Hadoop version: 2.6.0 >> > HBase version: 0.99.2 >> > HTrace version: htrace-master >> > OS: Ubuntu 12.04 >> > >> > >> > Joshua >>