Author: stack Date: Wed Sep 19 09:45:01 2007 New Revision: 577355 URL: http://svn.apache.org/viewvc?rev=577355&view=rev Log: HADOOP-1920 Wrapper scripts broken when hadoop in one location and hbase in another
Modified: lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt lucene/hadoop/trunk/src/contrib/hbase/README.txt lucene/hadoop/trunk/src/contrib/hbase/bin/hbase lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-config.sh lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemon.sh lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemons.sh lucene/hadoop/trunk/src/contrib/hbase/bin/regionservers.sh lucene/hadoop/trunk/src/contrib/hbase/bin/start-hbase.sh lucene/hadoop/trunk/src/contrib/hbase/bin/stop-hbase.sh Modified: lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt (original) +++ lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt Wed Sep 19 09:45:01 2007 @@ -48,8 +48,10 @@ HADOOP-1870 Once file system failure has been detected, don't check it again and get on with shutting down the hbase cluster. HADOOP-1888 NullPointerException in HMemcacheScanner (reprise) - HADOOP-1903 Possible data loss if Exception happens between snapshot and flush - to disk. + HADOOP-1903 Possible data loss if Exception happens between snapshot and + flush to disk. + HADOOP-1920 Wrapper scripts broken when hadoop in one location and hbase in + another IMPROVEMENTS HADOOP-1737 Make HColumnDescriptor data publically members settable Modified: lucene/hadoop/trunk/src/contrib/hbase/README.txt URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/README.txt?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/README.txt (original) +++ lucene/hadoop/trunk/src/contrib/hbase/README.txt Wed Sep 19 09:45:01 2007 @@ -1,285 +1 @@ -HBASE -Michael Cafarella - - -This document gives a quick overview of HBase, the Hadoop simple -database. It is extremely similar to Google's BigTable, with a just a -few differences. If you understand BigTable, great. If not, you should -still be able to understand this document. - ---------------------------------------------------------------- -I. - -HBase uses a data model very similar to that of BigTable. Users store -data rows in labelled tables. A data row has a sortable key and an -arbitrary number of columns. The table is stored sparsely, so that -rows in the same table can have crazily-varying columns, if the user -likes. - -A column name has the form "<group>:<label>" where <group> and <label> -can be any string you like. A single table enforces its set of -<group>s (called "column groups"). You can only adjust this set of -groups by performing administrative operations on the table. However, -you can use new <label> strings at any write without preannouncing -it. HBase stores column groups physically close on disk. So the items -in a given column group should have roughly the same write/read -behavior. - -Writes are row-locked only. You cannot lock multiple rows at once. All -row-writes are atomic by default. - -All updates to the database have an associated timestamp. The HBase -will store a configurable number of versions of a given cell. Clients -can get data by asking for the "most recent value as of a certain -time". Or, clients can fetch all available versions at once. - ---------------------------------------------------------------- -II. - -To the user, a table seems like a list of data tuples, sorted by row -key. Physically, tables are broken into HRegions. An HRegion is -identified by its tablename plus a start/end-key pair. A given HRegion -with keys <start> and <end> will store all the rows from (<start>, -<end>]. A set of HRegions, sorted appropriately, forms an entire -table. - -All data is physically stored using Hadoop's DFS. Data is served to -clients by a set of HRegionServers, usually one per machine. A given -HRegion is served by only one HRegionServer at a time. - -When a client wants to make updates, it contacts the relevant -HRegionServer and commits the update to an HRegion. Upon commit, the -data is added to the HRegion's HMemcache and to the HRegionServer's -HLog. The HMemcache is a memory buffer that stores and serves the -most-recent updates. The HLog is an on-disk log file that tracks all -updates. The commit() call will not return to the client until the -update has been written to the HLog. - -When serving data, the HRegion will first check its HMemcache. If not -available, it will then check its on-disk HStores. There is an HStore -for each column family in an HRegion. An HStore might consist of -multiple on-disk HStoreFiles. Each HStoreFile is a B-Tree-like -structure that allow for relatively fast access. - -Periodically, we invoke HRegion.flushcache() to write the contents of -the HMemcache to an on-disk HStore's files. This adds a new HStoreFile -to each HStore. The HMemcache is then emptied, and we write a special -token to the HLog, indicating the HMemcache has been flushed. - -On startup, each HRegion checks to see if there have been any writes -to the HLog since the most-recent invocation of flushcache(). If not, -then all relevant HRegion data is reflected in the on-disk HStores. If -yes, the HRegion reconstructs the updates from the HLog, writes them -to the HMemcache, and then calls flushcache(). Finally, it deletes the -HLog and is now available for serving data. - -Thus, calling flushcache() infrequently will be less work, but -HMemcache will consume more memory and the HLog will take a longer -time to reconstruct upon restart. If flushcache() is called -frequently, the HMemcache will take less memory, and the HLog will be -faster to reconstruct, but each flushcache() call imposes some -overhead. - -The HLog is periodically rolled, so it consists of multiple -time-sorted files. Whenever we roll the HLog, the HLog will delete all -old log files that contain only flushed data. Rolling the HLog takes -very little time and is generally a good idea to do from time to time. - -Each call to flushcache() will add an additional HStoreFile to each -HStore. Fetching a file from an HStore can potentially access all of -its HStoreFiles. This is time-consuming, so we want to periodically -compact these HStoreFiles into a single larger one. This is done by -calling HStore.compact(). - -Compaction is a very expensive operation. It's done automatically at -startup, and should probably be done periodically during operation. - -The Google BigTable paper has a slightly-confusing hierarchy of major -and minor compactions. We have just two things to keep in mind: - -1) A "flushcache()" drives all updates out of the memory buffer into - on-disk structures. Upon flushcache, the log-reconstruction time - goes to zero. Each flushcache() will add a new HStoreFile to each - HStore. - -2) a "compact()" consolidates all the HStoreFiles into a single - one. It's expensive, and is always done at startup. - -Unlike BigTable, Hadoop's HBase allows no period where updates have -been "committed" but have not been written to the log. This is not -hard to add, if it's really wanted. - -We can merge two HRegions into a single new HRegion by calling -HRegion.closeAndMerge(). We can split an HRegion into two smaller -HRegions by calling HRegion.closeAndSplit(). - -OK, to sum up so far: - -1) Clients access data in tables. -2) tables are broken into HRegions. -3) HRegions are served by HRegionServers. Clients contact an - HRegionServer to access the data within its row-range. -4) HRegions store data in: - a) HMemcache, a memory buffer for recent writes - b) HLog, a write-log for recent writes - c) HStores, an efficient on-disk set of files. One per col-group. - (HStores use HStoreFiles.) - ---------------------------------------------------------------- -III. - -Each HRegionServer stays in contact with the single HBaseMaster. The -HBaseMaster is responsible for telling each HRegionServer what -HRegions it should load and make available. - -The HBaseMaster keeps a constant tally of which HRegionServers are -alive at any time. If the connection between an HRegionServer and the -HBaseMaster times out, then: - - a) The HRegionServer kills itself and restarts in an empty state. - - b) The HBaseMaster assumes the HRegionServer has died and reallocates - its HRegions to other HRegionServers - -Note that this is unlike Google's BigTable, where a TabletServer can -still serve Tablets after its connection to the Master has died. We -tie them together, because we do not use an external lock-management -system like BigTable. With BigTable, there's a Master that allocates -tablets and a lock manager (Chubby) that guarantees atomic access by -TabletServers to tablets. HBase uses just a single central point for -all HRegionServers to access: the HBaseMaster. - -(This is no more dangerous than what BigTable does. Each system is -reliant on a network structure (whether HBaseMaster or Chubby) that -must survive for the data system to survive. There may be some -Chubby-specific advantages, but that's outside HBase's goals right -now.) - -As HRegionServers check in with a new HBaseMaster, the HBaseMaster -asks each HRegionServer to load in zero or more HRegions. When the -HRegionServer dies, the HBaseMaster marks those HRegions as -unallocated, and attempts to give them to different HRegionServers. - -Recall that each HRegion is identified by its table name and its -key-range. Since key ranges are contiguous, and they always start and -end with NULL, it's enough to simply indicate the end-key. - -Unfortunately, this is not quite enough. Because of merge() and -split(), we may (for just a moment) have two quite different HRegions -with the same name. If the system dies at an inopportune moment, both -HRegions may exist on disk simultaneously. The arbiter of which -HRegion is "correct" is the HBase meta-information (to be discussed -shortly). In order to distinguish between different versions of the -same HRegion, we also add a unique 'regionId' to the HRegion name. - -Thus, we finally get to this identifier for an HRegion: - -tablename + endkey + regionId. - -You can see this identifier being constructed in -HRegion.buildRegionName(). - -We can also use this identifier as a row-label in a different -HRegion. Thus, the HRegion meta-info is itself stored in an -HRegion. We call this table, which maps from HRegion identifiers to -physical HRegionServer locations, the META table. - -The META table itself can grow large, and may be broken into separate -HRegions. To locate all components of the META table, we list all META -HRegions in a ROOT table. The ROOT table is always contained in a -single HRegion. - -Upon startup, the HRegionServer immediately attempts to scan the ROOT -table (because there is only one HRegion for the ROOT table, that -HRegion's name is hard-coded). It may have to wait for the ROOT table -to be allocated to an HRegionServer. - -Once the ROOT table is available, the HBaseMaster can scan it and -learn of all the META HRegions. It then scans the META table. Again, -the HBaseMaster may have to wait for all the META HRegions to be -allocated to different HRegionServers. - -Finally, when the HBaseMaster has scanned the META table, it knows the -entire set of HRegions. It can then allocate these HRegions to the set -of HRegionServers. - -The HBaseMaster keeps the set of currently-available HRegionServers in -memory. Since the death of the HBaseMaster means the death of the -entire system, there's no reason to store this information on -disk. All information about the HRegion->HRegionServer mapping is -stored physically on different tables. Thus, a client does not need to -contact the HBaseMaster after it learns the location of the ROOT -HRegion. The load on HBaseMaster should be relatively small: it deals -with timing out HRegionServers, scanning the ROOT and META upon -startup, and serving the location of the ROOT HRegion. - -The HClient is fairly complicated, and often needs to navigate the -ROOT and META HRegions when serving a user's request to scan a -specific user table. If an HRegionServer is unavailable or it does not -have an HRegion it should have, the HClient will wait and retry. At -startup or in case of a recent HRegionServer failure, the correct -mapping info from HRegion to HRegionServer may not always be -available. - -In summary: - -1) HRegionServers offer access to HRegions (an HRegion lives at one - HRegionServer) -2) HRegionServers check in with the HBaseMaster -3) If the HBaseMaster dies, the whole system dies -4) The set of current HRegionServers is known only to the HBaseMaster -5) The mapping between HRegions and HRegionServers is stored in two - special HRegions, which are allocated to HRegionServers like any - other. -6) The ROOT HRegion is a special one, the location of which the - HBaseMaster always knows. -7) It's the HClient's responsibility to navigate all this. - - ---------------------------------------------------------------- -IV. - -What's the current status of all this code? - -As of this writing, there is just shy of 7000 lines of code in the -"hbase" directory. - -All of the single-machine operations (safe-committing, merging, -splitting, versioning, flushing, compacting, log-recovery) are -complete, have been tested, and seem to work great. - -The multi-machine stuff (the HBaseMaster, the HRegionServer, and the -HClient) have not been fully tested. The reason is that the HClient is -still incomplete, so the rest of the distributed code cannot be -fully-tested. I think it's good, but can't be sure until the HClient -is done. However, the code is now very clean and in a state where -other people can understand it and contribute. - -Other related features and TODOs: - -1) Single-machine log reconstruction works great, but distributed log - recovery is not yet implemented. This is relatively easy, involving - just a sort of the log entries, placing the shards into the right - DFS directories - -2) Data compression is not yet implemented, but there is an obvious - place to do so in the HStore. - -3) We need easy interfaces to MapReduce jobs, so they can scan tables - -4) The HMemcache lookup structure is relatively inefficient - -5) File compaction is relatively slow; we should have a more - conservative algorithm for deciding when to apply compaction. - -6) For the getFull() operation, use of Bloom filters would speed - things up - -7) We need stress-test and performance-number tools for the whole - system - -8) There's some HRegion-specific testing code that worked fine during - development, but it has to be rewritten so it works against an - HRegion while it's hosted by an HRegionServer, and connected to an - HBaseMaster. This code is at the bottom of the HRegion.java file. - +See http://wiki.apache.org/lucene-hadoop/Hbase Modified: lucene/hadoop/trunk/src/contrib/hbase/bin/hbase URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/bin/hbase?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/bin/hbase (original) +++ lucene/hadoop/trunk/src/contrib/hbase/bin/hbase Wed Sep 19 09:45:01 2007 @@ -127,8 +127,16 @@ IFS= # for releases, add core hbase, hadoop jar & webapps to CLASSPATH +# Look in two places for our hbase jar. +for f in $HBASE_HOME/hadoop-*-hbase*.jar; do + if [ -f $f ]; then + CLASSPATH=${CLASSPATH}:$f; + fi +done for f in $HADOOP_HOME/contrib/hadoop-*-hbase*.jar; do - CLASSPATH=${CLASSPATH}:$f; + if [ -f $f ]; then + CLASSPATH=${CLASSPATH}:$f; + fi done if [ -d "$HADOOP_HOME/webapps" ]; then CLASSPATH=${CLASSPATH}:$HADOOP_HOME Modified: lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-config.sh URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-config.sh?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-config.sh (original) +++ lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-config.sh Wed Sep 19 09:45:01 2007 @@ -81,4 +81,4 @@ # Allow alternate hbase conf dir location. HBASE_CONF_DIR="${HBASE_CONF_DIR:-$HBASE_HOME/conf}" # List of hbase regions servers. -HBASE_REGIONSERVERS="${HBASE_REGIONSERVERS:-$HBASE_HOME/conf/regionservers}" +HBASE_REGIONSERVERS="${HBASE_REGIONSERVERS:-$HBASE_CONF_DIR/regionservers}" Modified: lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemon.sh URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemon.sh?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemon.sh (original) +++ lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemon.sh Wed Sep 19 09:45:01 2007 @@ -33,7 +33,9 @@ # # Modelled after $HADOOP_HOME/bin/hadoop-daemon.sh -usage="Usage: hbase-daemon.sh [--config=<hadoop-conf-dir>] [--hbaseconfig=<hbase-conf-dir>] <hbase-command> (start|stop) <args...>" +usage="Usage: hbase-daemon.sh [--config=<hadoop-conf-dir>]\ + [--hbaseconfig=<hbase-conf-dir>] <hbase-command> (start|stop)\ + <args...>" # if no args specified, show usage if [ $# -le 1 ]; then @@ -114,7 +116,10 @@ hbase_rotate_log $log echo starting $command, logging to $log - nohup nice -n $HADOOP_NICENESS "$HBASE_HOME"/bin/hbase --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" $command $startStop "$@" > "$log" 2>&1 < /dev/null & + nohup nice -n $HADOOP_NICENESS "$HBASE_HOME"/bin/hbase \ + --hadoop="${HADOOP_HOME}" \ + --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" \ + $command $startStop "$@" > "$log" 2>&1 < /dev/null & echo $! > $pid sleep 1; head "$log" ;; @@ -123,7 +128,10 @@ if [ -f $pid ]; then if kill -0 `cat $pid` > /dev/null 2>&1; then echo -n stopping $command - nohup nice -n $HADOOP_NICENESS "$HBASE_HOME"/bin/hbase --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" $command $startStop "$@" > "$log" 2>&1 < /dev/null & + nohup nice -n $HADOOP_NICENESS "$HBASE_HOME"/bin/hbase \ + --hadoop="${HADOOP_HOME}" \ + --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" \ + $command $startStop "$@" > "$log" 2>&1 < /dev/null & while kill -0 `cat $pid` > /dev/null 2>&1; do echo -n "." sleep 1; Modified: lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemons.sh URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemons.sh?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemons.sh (original) +++ lucene/hadoop/trunk/src/contrib/hbase/bin/hbase-daemons.sh Wed Sep 19 09:45:01 2007 @@ -23,7 +23,10 @@ # Run a Hadoop hbase command on all slave hosts. # Modelled after $HADOOP_HOME/bin/hadoop-daemons.sh -usage="Usage: hbase-daemons.sh [--config=<confdir>] [--hbaseconfig=<hbase-confdir>] [--hosts=regionserversfile] command [start|stop] args..." +usage="Usage: hbase-daemons.sh [--hadoop=<hadoop-home>] + [--config=<hadoop-confdir>] [--hbase=<hbase-home>]\ + [--hbaseconfig=<hbase-confdir>] [--hosts=regionserversfile]\ + command [start|stop] args..." # if no args specified, show usage if [ $# -le 1 ]; then @@ -36,4 +39,8 @@ . $bin/hbase-config.sh -exec "$bin/regionservers.sh" --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" cd "$HBASE_HOME" \; "$bin/hbase-daemon.sh" --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" "$@" +exec "$bin/regionservers.sh" --config="${HADOOP_CONF_DIR}" \ + --hbaseconfig="${HBASE_CONF_DIR}" --hadoop="${HADOOP_HOME}" \ + cd "${HBASE_HOME}" \; \ + "$bin/hbase-daemon.sh" --config="${HADOOP_CONF_DIR}" \ + --hbaseconfig="${HBASE_CONF_DIR}" --hadoop="${HADOOP_HOME}" "$@" Modified: lucene/hadoop/trunk/src/contrib/hbase/bin/regionservers.sh URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/bin/regionservers.sh?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/bin/regionservers.sh (original) +++ lucene/hadoop/trunk/src/contrib/hbase/bin/regionservers.sh Wed Sep 19 09:45:01 2007 @@ -33,7 +33,8 @@ # # Modelled after $HADOOP_HOME/bin/slaves.sh. -usage="Usage: regionservers [--config=<confdir>] [--hbaseconfig=<hbase-confdir>] command..." +usage="Usage: regionservers [--config=<hadoop-confdir>]\ + [--hbaseconfig=<hbase-confdir>] command..." # if no args specified, show usage if [ $# -le 0 ]; then Modified: lucene/hadoop/trunk/src/contrib/hbase/bin/start-hbase.sh URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/bin/start-hbase.sh?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/bin/start-hbase.sh (original) +++ lucene/hadoop/trunk/src/contrib/hbase/bin/start-hbase.sh Wed Sep 19 09:45:01 2007 @@ -38,5 +38,8 @@ then exit $errCode fi -"$bin"/hbase-daemon.sh --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" master start -"$bin"/hbase-daemons.sh --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" regionserver start +"$bin"/hbase-daemon.sh --config="${HADOOP_CONF_DIR}" \ + --hbaseconfig="${HBASE_CONF_DIR}" master start +"$bin"/hbase-daemons.sh --config="${HADOOP_CONF_DIR}" \ + --hbaseconfig="${HBASE_CONF_DIR}" --hadoop="${HADOOP_HOME}" \ + --hosts="${HBASE_REGIONSERVERS}" regionserver start Modified: lucene/hadoop/trunk/src/contrib/hbase/bin/stop-hbase.sh URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/bin/stop-hbase.sh?rev=577355&r1=577354&r2=577355&view=diff ============================================================================== --- lucene/hadoop/trunk/src/contrib/hbase/bin/stop-hbase.sh (original) +++ lucene/hadoop/trunk/src/contrib/hbase/bin/stop-hbase.sh Wed Sep 19 09:45:01 2007 @@ -29,4 +29,5 @@ . "$bin"/hbase-config.sh -"$bin"/hbase-daemon.sh --config="${HADOOP_CONF_DIR}" --hbaseconfig="${HBASE_CONF_DIR}" master stop +"$bin"/hbase-daemon.sh --config="${HADOOP_CONF_DIR}" \ + --hbaseconfig="${HBASE_CONF_DIR}" master stop