This is a really good question. I think many operators will give a lot of leeway to data format changes as long as data can be copied from A to B (perhaps with batch rewrite to upgrade (ideally, not required)) and replication can be enabled to sync up to the current moment for cut over.
> On Aug 4, 2017, at 10:00 AM, Esteban Gutierrez <este...@cloudera.com> wrote: > > Should we add additional details around replication as well? for instance, > shall we consider a hbase-1.x cluster as a client for a hbase-2.x cluster? > > Thanks for starting this discussion Stack, > > esteban. > > -- > Cloudera, Inc. > > >> On Fri, Aug 4, 2017 at 1:05 AM, stack <saint....@gmail.com> wrote: >> >> Thanks Zach for clarification. Let me work up a list and then come back to >> this thread. Jira needs an edit pass to. >> >> S >> >> On Aug 3, 2017 23:54, "Zach York" <zyork.contribut...@gmail.com> wrote: >> >> This kinda helps, but these seem more like expectations. I was going more >> for things like HFile format changed, meta table structure changed, >> coprocessor implementations changed (these are just examples, I don't know >> if any of these actually changed). >> >> More technical differences between branch-1 and branch-2 which then can >> help us get the right expectations for compatibility. >> >>> On Wed, Aug 2, 2017 at 6:34 PM, Stack <st...@duboce.net> wrote: >>> >>> On Wed, Aug 2, 2017 at 5:25 PM, Zach York <zyork.contribut...@gmail.com> >>> wrote: >>> >>>> Do we know what the major pain points for migration are? Can we discuss >>>> that/get a list going? >>>> >>>> >>> Here's a few in outline: >>> >>> + There is issue of formats, of hbase-2.x being able to read hbase-1.x >> data >>> whether from HDFS or ZooKeeper or off the wire. >>> + An hbase-1.x client should be able to Get/Put and Scan an hbase-2.x >>> cluster; no holes in the API or unintelligible serializations. >>> + There is then the little dance that has us rolling restart from an >>> hbase-1.x cluster to hbase-2.x; i.e. upgrade master first and then it >> will >>> assign regions to the new hbase-2.x regionservers as they come on line. >>> TBD. >>> >>> Is this what you mean sir? >>> >>> S >>> >>> >>>> I think without that knowledge it is hard (for me at least :) ) to >>>> determine where we should set our sights in terms of migration. >>>> >>>> Thanks, >>>> Zach >>>> >>>>> On Wed, Aug 2, 2017 at 4:38 PM, Stack <st...@duboce.net> wrote: >>>>> >>>>> What are our expectations regards compatibility between hbase1 and >>>> hbase2? >>>>> >>>>> Lets have a chat about it. Here are some goal posts. >>>>> >>>>> + You have to upgrade to hbase-1.x before you can migrate to hbase-2. >>> No >>>>> migration from < hbase-1 (Is this too onerous? Should we support 0.98 >>> => >>>>> 2.0?). >>>>> + You do NOT have to upgrade to the latest release of hbase1 to >> migrate >>>> to >>>>> hbase2; being up on hbase-1.0.0+ will be sufficient. >>>>> + You'll have to update your hbase1 coprocessors to deploy them on >>>> hbase2. >>>>> A bunch of CP API has/will change by the time hbase2 comes out; e.g. >>>>> watching for region split on RegionServer no longer makes sense given >>>>> Master runs all splits now. >>>>> + An hbase1 client can run against an hbase2 cluster but it will only >>> be >>>>> able to do DML (Get/Put/Scan, etc.). We do not allow being able to do >>>> admin >>>>> ops using an hbase1 Admin client against an hbase2 cluster. We have >>> some >>>>> egregious API violations in branch-1; e.g. we have protobuf in our >> API >>>> (See >>>>> HBASE-15607). The notion is that we can't afford a deprecation cycle >>>>> purging this stuff from our Admin API. >>>>> >>>>> What you all think? >>>>> >>>>> St.Ack >>>>> >>>> >>> >>