That is a great point. I have been meaning to set up the Jenkins build for branch-2 for a while, so I took the 10 mins and just did it.
https://builds.apache.org/job/Hadoop-Common-2-Commit/ Don't let the name fool you, it publishes not just common, but HDFS, YARN, MR, and tools too. You should now have branch-2 SNAPSHOTS updated on each commit to branch-2. Feel free to bug me if you need more integration points. I am not an RE guy, but I can hack it to make things work :) --Bobby On 3/5/13 12:15 AM, "Konstantin Boudnik" <[email protected]> wrote: >Arun, > >first of all, I don't think anyone is trying to put a blame on someone >else. E.g. I had similar experience with Oozie being broken because of >certain released changes in the upstream. > >I am sure that most people in BigTop community - especially those who >share the committer-ship privilege in BigTop and other upstream >projects, including Hadoop, - would be happy to help with the >stabilization of the Hadoop base. The issue that a downstream >integration project is likely to have is - for once - the absence of >regularly published development artifacts. In the light of "it didn't >happen if there's no picture" here's a couple of examples: > > - 2.0.2-SNAPSHOT weren't published at all; only release 2.0.2-alpha >artifacts were > - 2.0.3-SNAPSHOT weren't published until Feb 29, 2013 (it happened just >once) > >So, technically speaking, unless an integration project is willing to >build and maintain its own artifacts, it is impossible to do any >preventive validation. > >Which brings me to my next question: how do you guys address >"Integration is high on the list of *every* release". Again, please >don't get me wrong - I am not looking to lay a blame on or corner >anyone - I am really curious and would appreciate the input. > > >Vinod: > >> As you yourself noted later, the pain is part of the 'alpha' status >> of the release. We are targeting +one of the immediate future >> releases to be a beta and so these troubles are really only the >> short +term. > >I don't really want to get into the discussion about of what >constitutes the alpha and how it has delayed the adoption of Hadoop2 >line. However, I want to point out that it is especially important for >"alpha" platform to work nicely with downstream consumers of the said >platform. For quite obvious reasons, I believe. > >> I think there is a fundamental problem with the interaction of >> Bigtop with the downstream projects, if nothing else, with > >BigTop is as downstream as it can get, because BigTop essentially >consumes all other component releases in order to produce a viable >stack. Technicalities aside... > >> Hadoop. We never formalized on the process, will BigTop step in >> after an RC is up for vote or before? As I see it, it's happening > >Bigtop essentially can give any component, including Hadoop, and >better yet - the set of components - certain guaratees about >compatibility and dependencies being included. Case in point is >missing commons libraries missed in 1.0.1 release that essentially >prevented HBase from working properly. > >> after the vote is up, so no wonder we are in this state. Shall we >> have a pre-notice to Bigtop so that it can step in before? > >The above is in contradiction with earlier statement of "Integration >is high on the list of *every* release". If BigTop isn't used for >integration testing, then how said integration testing is performed? >Is it some sort of test-patch process as Luke referred earlier? And >why it leaves the room for the integration issues being uncaught? >Again, I am genuinely interested to know. > >> these short term pains. I'd rather like us swim through these now >> instead of support broken APIs and features in our beta, having seen >> this very thing happen with 1.*. > >I think you're mixing the point of integration with downstream and >being in an alpha phase of the development. The former isn't about >supporting "broken APIs" - it is about being consistent and avoid >breaking the downstream applicaitons without letting said applications >to accomodate the platform changes first. > >Changes in the API, after all, can be relatively easy traced by >integration validation - this is the whole point of integration >testing. And BigTop does the job better then anything around, simply >because there's nothing else around to do it. > >If you stay in shape-shifting "alpha" that doesn't integrate well for >a very long time, you risk to lose downstream customers' interest, >because they might get tired of waiting until a next stable API will >be ready for them. > >> Let's fix the way the release related communication is happening >> across our projects so that we can all work together and make 2.X a >> success. > >This is a very good point indeed! Let's start a separate discussion >thread on how we can improve the release model for coming Hadoop >releases, where we - as the community - can provide better guarantees >of the inter-component compatibility (sorry for an overused word). > >Cos > >On Fri, Mar 01, 2013 at 10:58AM, Arun C Murthy wrote: >> I feel this is being blown out of proportion. >> >> Integration is high on the list of *every* release. In future, if >>anyone or >> bigtop wants to help, running integration tests on a hadoop RC and >>providing >> feedback would be very welcome. I'm pretty sure I will stop an RC if it >> means it breaks and Oozie or HBase or Pig or Hive and re-spin it. For >>e.g. >> see recent efforts to do a 2.0.4-alpha. >> >> With hadoop-2.0.3-alpha we discovered 3 *bugs* - making it sound like we >> intentionally disregard integation issues is very harsh. >> >> Please also see other thread where we discussed stabilizing APIS, >>protocols >> etc. for the next 'beta' release. >> >> Arun >> >> On Feb 26, 2013, at 5:43 PM, Roman Shaposhnik wrote: >> >> > Hi! >> > >> > for the past couple of releases of Hadoop 2.X code line the issue >> > of integration between Hadoop and its downstream projects has >> > become quite a thorny issue. The poster child here is Oozie, where >> > every release of Hadoop 2.X seems to be breaking the compatibility >> > in various unpredictable ways. At times other components (such >> > as HBase for example) also seem to be affected. >> > >> > Now, to be extremely clear -- I'm NOT talking about the *latest* >>version >> > of Oozie working with the *latest* version of Hadoop, instead >> > my observations come from running previous *stable* releases >> > of Bigtop on top of Hadoop 2.X RCs. >> > >> > As many of you know Apache Bigtop aims at providing a single >> > platform for integration of Hadoop and Hadoop ecosystem projects. >> > As such we're uniquely positioned to track compatibility between >> > different Hadoop releases with regards to the downstream components >> > (things like Oozie, Pig, Hive, Mahout, etc.). Every single single RC >> > we've been pretty diligent at trying to provide integration-level >>feedback >> > on the quality of the upcoming release, but it seems that our efforts >> > don't quite suffice in Hadoop 2.X stabilizing. >> > >> > Of course, one could argue that while Hadoop 2.X code line was >> > designated 'alpha' expecting much in the way of perfect integration >> > and compatibility was NOT what the Hadoop community was >> > focusing on. I can appreciate that view, but what I'm interested in >> > is the future of Hadoop 2.X not its past. Hence, here's my question >> > to all of you as a Hadoop community at large: >> > >> > Do you guys think that the project have reached a point where >>integration >> > and compatibility issues should be prioritized really high on the list >> > of things that make or break each future release? >> > >> > The good news, is that Bigtop's charter is in big part *exactly* about >> > providing you with this kind of feedback. We can easily tell you when >> > Hadoop behavior, with regard to downstream components, changes >> > between a previous stable release and the new RC (or even >>branch/trunk). >> > What we can NOT do is submit patches for all the issues. We are simply >> > too small a project and we need your help with that. >> > >> > I truly believe that we owe it to the downstream projects, and in the >> > second half of this email I will try to convince you of that. >> > >> > We all know that integration projects are impossible to pull off >> > unless there's a general consensus between all of the projects >>involved >> > that they indeed need to work with each other. You can NOT force >> > that notion, but you can always try to influence. This relationship >> > goes both ways. >> > >> > Consider a question in front of the downstream communities >> > of whether or not to adopt Hadoop 2.X as the basis. To answer >> > that question each downstream project has to be reasonably >> > sure that their concerns will NOT fall on deaf ears and that >> > Hadoop developers are, essentially, 'ready' for them to pick >> > up Hadoop 2.X. I would argue that so far the Hadoop community >> > had gone out of its way to signal that 2.X codeline is NOT >> > ready for the downstream. >> > >> > I would argue that moving forward this is a really unfortunate >> > situation that may end up undermining the long term success >> > of Hadoop 2.X if we don't start addressing the problem. Think >> > about it -- 90% of unit tests that run downstream on Apache >> > infrastructure are still exercising Hadoop 1.X underneath. >> > In fact, if you were to forcefully make, lets say, HBase's >> > unit tests run on top of Hadoop 2.X quite a few of them >> > are going to fail. Hadoop community is, in effect, cutting >> > itself off from the biggest source of feedback -- its downstream >> > users. This in turn: >> > >> > * leaves Hadoop project in a perpetual state of broken >> > windows syndrome. >> > >> > * leaves Apache Hadoop 2.X releases in a state considerably >> > inferior to the releases *including* Apache Hadoop done by the >> > vendors. The users have no choice but to alight themselves >> > with vendor offerings if they wish to utilize latest Hadoop >>functionality. >> > The artifact that is know as Apache Hadoop 2.X stopped being >> > a viable choice thus fracturing the user community and reducing >> > the benefits of a commonly deployed codebase. >> > >> > * leaves downstream projects of Hadoop in a jaded state where >> > they legitimately get very discouraged and frustrated and >>eventually >> > give up thinking that -- well, we work with one release of Hadoop >> > (the stable one Hadoop 1.X) and we shall wait for the Hadoop >> > community to get their act together. >> > >> > In my view (shared by quite a few members of the Apache Bigtop) we >> > can definitely do better than this if we all agree that the proposed >> > first 'beta' release of Hadoop 2.0.4 is the right time for it to >>happen. >> > >> > It is about time Hadoop 2.X community wins back all those end users >> > and downstream projects that got left behind during the alpha >> > stabilization phase. >> > >> > Thanks, >> > Roman. >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >>
