On Wed, Mar 11, 2015 at 3:07 PM, Sean Busbey <bus...@cloudera.com> wrote:
> On Wed, Mar 11, 2015 at 4:49 PM, Enis Söztutar <enis....@gmail.com> wrote: > > > > > > > It's worth noting that if users follow our ref guide (which says to use > > > "hadoop jar"), then jobs don't fail. It's only when they attempt to > > launch > > > jobs using "hbase com.example.MyDriver" that things fail. > > > > > > Additionally, if we stick to telling users that only the "hadoop jar" > > > version is supported, we can rely on the application classpath support > > > built into Hadoop 2.6+ to make it so jobs built on us get our > dependency > > > version and not the ones from Hadoop as it changes. > > > > > > > We have learned that the users do not read or follow documentation. And > it > > is a regression > > if launching job using hbase command does not work. > > > > > > > They do when things break. ;) An additional troubleshooting section that > shows the error and says "remember to use hadoop jar" would nicely help > catch searchers. > > Furthermore, "hadoop jar" is how you're supposed to launch YARN apps. If we > say that doing things via the hbase command is acceptable, we're opening > ourselves up to an expansion of what the hbase command has to do. (i.e. > perhaps it should detect if the passed class is a YARN driver and then use > the hadoop jar command? or should it always pass through to the hadoop jar > command?) > Traditionally, and in our documentation, HBase owned MR classes (CopyTable, Import, etc) are run with the hbase script, not the hadoop script. It is a regression in that sense still. Yes, there is a workaround, but why we bother where we can fix this easily. > > > > > > > > > > > > > > > > So, my proposal is: > > > > - Commit HBASE-13149 to master and 1.1 > > > > - Either change the dependency compat story for minor versions to > > false, > > > > or add a footnote saying that there may be exceptions because of the > > > > reasons listed above. > > > > > > > > > > > > > If we decide we need to do the jackson version bump, what about the > > > possibility of moving the code in branch-1 to be version 2.0.0 (and > > making > > > master 3.0.0). We could start the release process once the changes > Andrew > > > needs for Phoenix are in place and get it out the door. > > > > > > > I don't think this requires a major version bump. As I was mentioning in > > the other > > thread, HBase is not upgraded too frequently in production. Again, we do > > not want > > to inconvenience the user even further. > > > > > > > How would this inconvenience users further? Barring the change in version > numbers, it's the same upgrade they would be doing to move to what we're > currently calling HBase 1.1. Since version numbers under semver signal what > we understand about our changeset, it's just us acknowledging that we broke > some kind of compatibility. A release note that calls out the Jackson > dependency as the cause for that compatibility breakage makes the > evaluation easy. > The problem is boils down to "major versions are cheap" kind of argument, which have been discussed in Hadoop context. I do not buy it, because a major version upgrade implies (though do not have to be) a big change. I don't see why ever we would want to bump our major version, where the said library only bumped their minor version. Jackson could have went with 2.0 for those changes between 1.8 and 1.9. Why would we want to promise more than what our dependencies promise? It is not realistic. > > In the current state of the code, we'd just need to make some documentation > changes and then the same upgrade paths as for 1.1 should work just fine. > Provided we don't take too long getting the release out, I'd expect many > users would just upgrade from 0.98 to (the proposed) 2.0.0. > > (I mentioned the changes Andrew needs only because it's my understanding > that those are the driving factor on branch-1 getting to release, not > because I expect them to be breaking.) > > > > > > > > It would do a nice job of desensitizing us to major version increments > > and > > > we'd be able to document it as a very safe major version upgrade since > > the > > > only breakage is that dependency. We could then limit the HBase 1.y > line > > to > > > just 1.0.z and add a FAQ item if enough folks ask about why the sudden > > > increment. > > > > > > > Doing a major version just to update one dependency version is too much I > > think. > > > > > But that's the point of following semver and defining a compatibility > document. The sufficient criteria for a major version bump expressly covers > updating a single dependency in a non-breaking way. > > There will be plenty of major version numbers to go through. The thing that > trips projects up is feeling like major version releases need to be > special. If we want to do that, then we shouldn't use semver. We should > define our own versioning standard and make it "Marketing, Major, Minor" > instead of "Major, Minor, Patch." (I would prefer we not do this.) > > > > > > > > > I'm -1 on the idea of exceptions for our compatibility story. We > already > > > note that just because we can break something doesn't mean we will. > That > > > does a good job of pointing out that we recognize there's a cost. > > > > > > > We do not have to corner ourselves with the rules we have set. I can see > > how requiring > > JDK-8 or Hadoop-3 etc will justify major versions. But not a dependency > > library that > > users might be transitively depending on. If that is the case, the user > is > > expected to deal with it. > > > > > If we want to treat those differently then we need to update our > compatibility document to call out JVM and Hadoop support as a different > thing then the rest of our dependency promises. But we should not do this. > So long as we are forcing applications that integrate with us to use > particular versions of third party libraries, we make it much harder to > upgrade when we don't provide stability. > > -- > Sean >