Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hbase/MasterRewrite" page has been changed by stack. http://wiki.apache.org/hadoop/Hbase/MasterRewrite?action=diff&rev1=11&rev2=12 -------------------------------------------------- * Distributes out administered close, flush, compact messages * Watches ZK for its own lease and for regionservers so knows when to run recovery + After implementation of this design, master will do all of above except manage schema and distribute out messages to close, flush, etc. Any client can do the later by manipulating zk (we can add acl checks later). Remaining master tasks will be less prone to error and run snappier because no longer based on messaging carried atop periodic heartbeats from regionservers. + <<Anchor(problems)>> == Problems with current Master == @@ -44, +46 @@ 1. Each regionserver carries 100 regions of 1G each (100k regions =~ 100TB) <<Anchor(design)>> + == Design == - == Design == <<Anchor(moveall)>> + === Move all state, state transitions, and schema to go via zookeeper === + Currently state transitions are done inside master shuffling between Maps triggered by messages carried on the back of regionserver heartbeats. Move all to zookeeper. - === Move all state, state transitions, and schema to go via zookeeper === + <<Anchor(tablestate)>> + ==== Table State ==== - Tables are offlined, onlined, made read-only, and dropped (Add freeze of flushes and compactions state to facilitate snapshotting). Currently HBase Master does this by messaging regionservers. Instead move state to zookeeper. Let regionservers watch for changes and react. Allow that a cluster may have up to 100 tables. Tables are made of regions. There may be thousands of regions per table. A regionserver could be carrying a region from each of the 100 tables. TODO: Should regionserver have a table watcher or a watcher per region? + Tables are offlined, onlined, made read-only, and dropped (Add freeze of flushes and compactions state to facilitate snapshotting). Currently HBase Master does this by messaging regionservers. Instead move state to zookeeper. Let regionservers watch for changes and react. Allow that a cluster may have up to 100 tables. Tables are made of regions. There may be thousands of regions per table. A regionserver could be carrying a region from each of the 100 tables. - Tables have schema. Tables are made of column families. Column families have schema/attributes. Column families can be added and removed. Currently the schema is written into a column in the .META. catalog family. Move all schema to zookeeper. Regionservers would have watchers on schema and would react to changes. TODO: A watcher per column family or a watcher per table or a watcher on the parent directory for schema? + Tables have schema. Tables are made of column families. Column families have schema/attributes. Column families can be added and removed. Currently the schema is written into a column in the .META. catalog family. Move all schema to zookeeper. Regionservers would have watchers on schema and would react to changes. + + In a tables znode up in zk, have a file that per table on the cluster, it lists current state attributes -- read-only, no-flush -- and that tables' schema all in JSON. Only the differences from default are up in zk. All regionservers keep watch on this znode reacting if changed spinning through their list of regions making reconciliation with current state of tables znode content. + + <<Anchor(regionstate)>> + ==== Region State ==== Run region state transitions -- i.e. opening, closing -- by changing state in zookeeper rather than in Master maps as is currently done. @@ -74, +84 @@ # Is STARTCODE a timestamp or a random id? /hbase/rs/STARTCODE/load/ /hbase/rs/STARTCODE/regions/opening/ + /hbase/tables/TABLENAME {JSON array of table objects. Each table object would have state and schema objects, etc. State is read-only, offline, etc. Schema has differences from default only} - /hbase/tables/TABLENAME/schema/attributes serialized as JSON # These are table attributes. Distinct from state flags such as read-only. - /hbase/tables/TABLENAME/schema/families/FAMILYNAME/attributes serialized as JSON - /hbase/tables/TABLENAME/state/attribute # Can have only one attribute at a time? E.g. Read-only implies online and no flush/compaction. Allow support for multiple. }}} <<Anchor(clean)>>
