On Wed, Jul 1, 2009 at 10:11 PM, Dhruba Borthakur <dhr...@gmail.com> wrote:
> Hi Todd, > > Another option (one that is used by Hive) is to have an ant macro that can > be overridden from the ant command line. This macro points to the location > of the common.jar. By default, it is set to the same value as it is now. If > a developer has a common jar that is built in his/her directory, he/she can > set this macro from the command line while compiling hdfs. > > For example, > ant test > does the same as it does now, but > ant -Dhadoop.common.jar=/home/dhruba/common/hadoop-common.jar test will > pick up the common jar from my home directory. > > is this feasible? > That's feasible, but it will still require having a built jar in one or another repository after every new commit (yuck!) I imagine in hive's case it's reasonably rare that you have to import a new hadoop dev jar in, since you mostly target existing stable releases. This is going to be happening all the time in MR/HDFS, at least for the forseeable future imho. -Todd On Wed, Jul 1, 2009 at 6:45 PM, Todd Lipcon <tlip...@gmail.com> wrote: > On Wed, Jul 1, 2009 at 2:10 PM, Philip Zeyliger <phi...@cloudera.com> > wrote: > > > -1 to checking in jars. It's quite a bit of bloat in the repository > (which > > admittedly affects the git.apache folks more than the svn folks), but > it's > > also cumbersome to develop. > > > > It'd be nice to have a one-liner that builds the equivalent of the > tarball > > built by "ant binary" in the old world. When you're working on something > > that affects both common and hdfs, it'll be pretty painful to make the > jars > > in common, move them over to hdfs, and then compile hdfs. > > > > Could the build.xml in hdfs call into common's build.xml and build common > > as > > part of building hdfs? Or perhaps have a separate "top-level" build file > > that builds everything? > > > > Agree with Phillip here. Requiring a new jar to be checked in anywhere > after > every common commit seems unscalable and nonperformant. For git users this > will make the repository size baloon like crazy (the jar is 400KB and we > have around 5300 commits so far = 2GB!). For svn users it will still mean > that every "svn update" requires a download of a new jar. Using svn > externals to manage them also complicates things when trying to work on a > cross-component patch with two dirty directories - you really need a > symlink > between your working directories rather than through the SVN tree. > > I think it would be reasonable to require that developers check out a > structure like: > > working-dir/ > hadoop-common/ > hadoop-mapred/ > hadoop-hdfs/ > > We can then use relative paths for the mapred->common and hdfs->common > dependencies. Those who only work on HDFS or only work on mapred will not > have to check out the other, but everyone will check out common. > > Whether there exists a fourth repository (eg hadoop-build) that has a > build.xml that ties together the other build.xmls is another open question > IMO. > > -Todd >