On 4 May 2013 18:38, Roman Shaposhnik <[email protected]> wrote: > On Sat, May 4, 2013 at 3:33 PM, Tsz Wo Sze <[email protected]> wrote: >> The proposal sounds like an ideal solution but it is impractical. >> I think it is hard to make all API changes now and freezing them. >> Either it will just take a long time to finish the API changes, or >> we may miss some important API changes. > > In fact this was entire point of my comment wrt. high degree of > focus towards downstream components of anything that could > be potentially called a Hadoop beta release. > > The reality of the situation that we can't simply wish away is > that Hadoop is not Java -- it doesn't have a formal testsuite > along the lines of the TCK that can guarantee API stability.
I'm working on filesystem stuff, but that's only tested at the unit test level, "does it match the requirements", which is different from "does it work with Pig". > We don't have that. Hence we might as well use the next > best thing -- tons of code implemented downstream that > actually exercise Hadoop APIs. I'm in favour of this, and do want to tease out some of the swift-only scale tests for a bigtop patch that we can apply to all filesystems -they are good at finding bugs. For example, if you create a few thousand files in a blobstore then try to delete them, that's when you discover the RESTy endpoints throttle the operations and your code starts failing unless you insert self-throttling and extend socket timeouts. Those are the fun things you want to find sooner rather than later. > > Perhaps a shift of perspective is needed on the part of > Hadoop community -- we should stop looking at downstream > as just downstream and start looking at it as a de-facto > TCK. If we assume that vantage point then things like > making sure that there are regular Unit tests runs clearly > become something that useful to Hadoop directly, not > a 'downstream problem'. This is something we need to start working on, with Jenkins picking up problems sooner rather than later. for it to work -and I say this based on experience in multi-team projects- everyone needs to care about problems downstream. What I don't want to do -and am -voting against, is Kos's proposal: 1 (non-binding) There are some things in there -windows support- that shouldn't be reverted as Kos wants, and I don't want to go near the Hadoop numbering scheme again. 2.x is 2.x and let's not change our mind now. As Nicholas says, we need to get some of the protocols and APIs stable so they can stay forwards compatible. We've also had problems with HDFS changes in the 2.0 alphas; those wire things need to be locked down as well as -some- of the APIs. I'm (mostly) happy with management layer stuff changing, but not the code that users and downstream apps use. That doesn't mean 2.x needs to be feature complete, only that everyone in the core hadoop team is happy that it's got the hooks in there for the roadmap, and make sure that HDFS v2 is stable and trusted. Because if you look at the rate of change of Linux, you can say "they alternate stable/unstable every six months", but the development of ext4 is much more cautious -it has the same requirement as HDFS: preserve data you care about. We should get it out the door, then start to get that bigtop stack doing the nightly build/test, with failures being reported back to hadoop-common as well as bigtop. As a reference point, Apache gump was the first "build everything from SCM" Java project in the ASF, checking out everything, bootstrapping Ant off its build.sh script and the JDK, then going into the XML parsers and logging, before trying to do the rest of the DAG of Java projects. Its goal was portrayed as "nightly build and tests for your project", but the Ant team viewed it as our regression tests. Sadly Maven killed it by making it near-impossible to build an up to date dependency graph. If we have HDFS stable then YARN and MR can roll at a faster rate, along with the other downstream parts. One other thing I want to pick up on is that testing "at Y! scale" isn't sufficient to get all bugs ironed out -if you look at what surfaces outside its the people whose networks are a badly configured mess (mine included), virtualised with very, very odd networks, clock drift and RAM paging out under the OS's radar, or very different uses. Those are the problems that will show up in the -beta phase, precisely because -alpha isn't viewed as safe/stable. I expect some fun bugs to surface there -bugs we aren't going to see in bigtop either, because the kind of person who is trying to set up a cluster using the IP address pool 127.0.1.* isn't the kind of person who is going to check out bigtop and run nightly tests. -Steve
