Good stuff Ryan. I committed below to see if it fixes it:
pynchon-379:trunk stack$ svn diff src
Index: src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
(revision 1032837)
+++ src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
(working copy)
@@ -404,7 +404,8 @@
assignments.put(server, new ArrayList<HRegionInfo>());
}
for (Map.Entry<HRegionInfo, HServerAddress> region : regions.entrySet()) {
- HServerInfo server = serverMap.get(region.getValue());
+ HServerAddress hsa = region.getValue();
+ HServerInfo server = hsa == null? null: serverMap.get(hsa);
if (server != null) {
assignments.get(server).add(region.getKey());
} else {
St.Ack
On Mon, Nov 8, 2010 at 8:21 PM, Ryan Rawson <[email protected]> wrote:
> Looks like this first showed up here:
>
> https://hudson.apache.org/hudson/job/HBase-TRUNK/1629/
>
> Which included:
> https://issues.apache.org/jira/browse/HBASE-2896
>
> I'm guessing 'retainAssignment' was part of the added code path.
>
> On Mon, Nov 8, 2010 at 8:16 PM, Ryan Rawson <[email protected]> wrote:
>> Looks like this latest series of failures may be due to this:
>>
>> 2010-11-09 03:37:41,892 FATAL [Master:0;vesta.apache.org:52791]
>> master.HMaster(884): Unhandled exception. Starting shutdown.
>> java.lang.NullPointerException
>> at java.util.TreeMap.getEntry(TreeMap.java:324)
>> at java.util.TreeMap.get(TreeMap.java:255)
>> at
>> org.apache.hadoop.hbase.master.LoadBalancer.retainAssignment(LoadBalancer.java:407)
>> at
>> org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:1126)
>> at
>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:386)
>> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:272)
>> at java.lang.Thread.run(Thread.java:619)
>>
>>
>>
>>
>> On Mon, Nov 8, 2010 at 6:30 PM, Apache Hudson Server
>> <[email protected]> wrote:
>>> See <https://hudson.apache.org/hudson/job/HBase-TRUNK/1633/changes>
>>>
>>> Changes:
>>>
>>> [jdcryans] HBASE-3208 HLog.findMemstoresWithEditsOlderThan needs to look
>>> for edits
>>> that are equal to too
>>>
>>> ------------------------------------------
>>> [...truncated 807 lines...]
>>> A src/main/resources/org/apache/hadoop/hbase/mapreduce
>>> AU
>>> src/main/resources/org/apache/hadoop/hbase/mapreduce/RowCounter_Counters.properties
>>> A src/main/resources/org/apache/hadoop/hbase/mapred
>>> AU
>>> src/main/resources/org/apache/hadoop/hbase/mapred/RowCounter_Counters.properties
>>> A src/main/resources/org/apache/hadoop/hbase/rest
>>> A src/main/resources/org/apache/hadoop/hbase/rest/protobuf
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/TableSchemaMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/ScannerMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/StorageClusterStatusMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/CellSetMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/ColumnSchemaMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/CellMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/TableInfoMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/TableListMessage.proto
>>> A
>>> src/main/resources/org/apache/hadoop/hbase/rest/protobuf/VersionMessage.proto
>>> A src/main/resources/org/apache/hadoop/hbase/rest/XMLSchema.xsd
>>> A src/main/xslt
>>> A src/main/xslt/configuration_to_docbook_section.xsl
>>> A src/site
>>> A src/site/site.xml
>>> A src/site/site.vm
>>> A src/site/resources
>>> A src/site/resources/images
>>> AU src/site/resources/images/replication_overview.png
>>> AU src/site/resources/images/asf_logo_wide.png
>>> AU src/site/resources/images/architecture.gif
>>> AU src/site/resources/images/hadoop-logo.jpg
>>> AU src/site/resources/images/hbase_logo_med.gif
>>> AU src/site/resources/images/favicon.ico
>>> AU src/site/resources/images/hbase_small.gif
>>> A src/site/resources/css
>>> A src/site/resources/css/site.css
>>> A src/site/xdoc
>>> A src/site/xdoc/cygwin.xml
>>> A src/site/xdoc/acid-semantics.xml
>>> A src/site/xdoc/metrics.xml
>>> A src/site/xdoc/index.xml
>>> A src/site/xdoc/replication.xml
>>> A src/site/xdoc/old_news.xml
>>> A src/site/xdoc/bulk-loads.xml
>>> A src/site/xdoc/pseudo-distributed.xml
>>> A src/site/fml
>>> A src/site/fml/faq.fml
>>> A src/docbkx
>>> A src/docbkx/book.xml
>>> AU src/saveVersion.sh
>>> A src/examples
>>> A src/examples/thrift
>>> A src/examples/thrift/DemoClient.java
>>> A src/examples/thrift/DemoClient.cpp
>>> A src/examples/thrift/DemoClient.rb
>>> A src/examples/thrift/DemoClient.php
>>> AU src/examples/thrift/DemoClient.py
>>> A src/examples/thrift/README.txt
>>> A src/examples/thrift/Makefile
>>> A src/examples/mapreduce
>>> A src/examples/mapreduce/org
>>> A src/examples/mapreduce/org/apache
>>> A src/examples/mapreduce/org/apache/hadoop
>>> A src/examples/mapreduce/org/apache/hadoop/hbase
>>> A src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce
>>> A
>>> src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/SampleUploader.java
>>> A
>>> src/examples/mapreduce/org/apache/hadoop/hbase/mapreduce/IndexBuilder.java
>>> A src/examples/mapreduce/index-builder-setup.rb
>>> A src/examples/README.txt
>>> A bin
>>> AU bin/hbase-daemons.sh
>>> A bin/rename_table.rb
>>> AU bin/hbase
>>> A bin/copy_table.rb
>>> A bin/check_meta.rb
>>> AU bin/start-hbase.sh
>>> A bin/hirb.rb
>>> A bin/set_meta_block_caching.rb
>>> A bin/loadtable.rb
>>> AU bin/hbase-daemon.sh
>>> A bin/hbase-config.sh
>>> A bin/local-regionservers.sh
>>> AU bin/zookeepers.sh
>>> A bin/local-master-backup.sh
>>> A bin/add_table.rb
>>> AU bin/rolling-restart.sh
>>> AU bin/regionservers.sh
>>> AU bin/master-backup.sh
>>> A bin/replication
>>> A bin/replication/copy_tables_desc.rb
>>> AU bin/stop-hbase.sh
>>> A pom.xml
>>> A README.txt
>>> U .
>>> At revision 1032806
>>> [locks-and-latches] Checking to see if we really have the locks
>>> [locks-and-latches] Have all the locks, build can start
>>> [trunk] $ /home/hudson/tools/maven/apache-maven-2.2.1/bin/mvn clean
>>> -Dmaven.test.redirectTestOutputToFile=true install assembly:assembly
>>> [INFO] Scanning for projects...
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Building HBase
>>> [INFO] task-segment: [clean, install]
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] [clean:clean {execution: default-clean}]
>>> [INFO] [antrun:run {execution: generate}]
>>> [INFO] Executing tasks
>>> [mkdir] Created dir:
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/hbase-webapps>
>>> [copy] Copying 4 files to
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/hbase-webapps>
>>> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
>>> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
>>> details.
>>> 2010-11-09 01:46:53.286:INFO::Logging to STDERR via
>>> org.mortbay.log.StdErrLog
>>> [mkdir] Created dir:
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/hbase-webapps/master/WEB-INF>
>>> [mkdir] Created dir:
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/hbase-webapps/regionserver/WEB-INF>
>>> [INFO] Executed tasks
>>> [INFO] [build-helper:add-source {execution: add-jspc-source}]
>>> [INFO] Source directory:
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/jspc>
>>> added.
>>> [INFO] [build-helper:add-source {execution: add-package-info}]
>>> [INFO] Source directory:
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/generated-sources>
>>> added.
>>> [INFO] Setting property: classpath.resource.loader.class =>
>>> 'org.codehaus.plexus.velocity.ContextClassLoaderResourceLoader'.
>>> [INFO] Setting property: velocimacro.messages.on => 'false'.
>>> [INFO] Setting property: resource.loader => 'classpath'.
>>> [INFO] Setting property: resource.manager.logwhenfound => 'false'.
>>> [INFO] [remote-resources:process {execution: default}]
>>> [INFO] [resources:resources {execution: default-resources}]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 1 resource
>>> [INFO] Copying 6 resources
>>> [INFO] Copying 3 resources
>>> [INFO] [compiler:compile {execution: default-compile}]
>>> [INFO] Compiling 436 source files to
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/classes>
>>> [INFO] [resources:testResources {execution: default-testResources}]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 4 resources
>>> [INFO] Copying 3 resources
>>> [INFO] [compiler:testCompile {execution: default-testCompile}]
>>> [INFO] Compiling 173 source files to
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/test-classes>
>>> [INFO] [surefire:test {execution: default-test}]
>>> [INFO] Surefire report directory:
>>> <https://hudson.apache.org/hudson/job/HBase-TRUNK/ws/trunk/target/surefire-reports>
>>>
>>> -------------------------------------------------------
>>> T E S T S
>>> -------------------------------------------------------
>>> Running org.apache.hadoop.hbase.regionserver.TestColumnSeeking
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.627 sec
>>> Running org.apache.hadoop.hbase.client.TestMultipleTimestamps
>>> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 104.018 sec
>>> Running org.apache.hadoop.hbase.TestZooKeeper
>>> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 70.187 sec
>>> Running org.apache.hadoop.hbase.regionserver.wal.TestLogRolling
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 119.983 sec
>>> Running org.apache.hadoop.hbase.io.hfile.TestCachedBlockQueue
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.024 sec
>>> Running org.apache.hadoop.hbase.filter.TestPrefixFilter
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
>>> Running org.apache.hadoop.hbase.io.TestImmutableBytesWritable
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.061 sec
>>> Running org.apache.hadoop.hbase.io.TestHeapSize
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 sec
>>> Running org.apache.hadoop.hbase.rest.model.TestRowModel
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.059 sec
>>> Running org.apache.hadoop.hbase.io.hfile.TestHFileSeek
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.373 sec
>>> Running org.apache.hadoop.hbase.regionserver.TestStore
>>> Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.347 sec
>>> Running org.apache.hadoop.hbase.zookeeper.TestHQuorumPeer
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.216 sec
>>> Running org.apache.hadoop.hbase.rest.TestScannersWithFilters
>>> Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.42 sec
>>> Running org.apache.hadoop.hbase.io.hfile.TestHFilePerformance
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.793 sec
>>> Running org.apache.hadoop.hbase.io.TestHbaseObjectWritable
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.045 sec
>>> Running org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit
>>> Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 68.873 sec
>>> Running org.apache.hadoop.hbase.thrift.TestThriftServer
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 77.824 sec
>>> Running org.apache.hadoop.hbase.filter.TestColumnPrefixFilter
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.164 sec
>>> Running org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 49.315 sec
>>> Running org.apache.hadoop.hbase.TestCompare
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.017 sec
>>> Running org.apache.hadoop.hbase.regionserver.TestExplicitColumnTracker
>>> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.56 sec
>>> Running org.apache.hadoop.hbase.util.TestEnvironmentEdgeManager
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.225 sec
>>> Running org.apache.hadoop.hbase.util.TestDefaultEnvironmentEdge
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
>>> Running org.apache.hadoop.hbase.rest.TestTransform
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.082 sec
>>> Running org.apache.hadoop.hbase.util.TestFSUtils
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.449 sec
>>> Running org.apache.hadoop.hbase.master.TestMasterTransitions
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 71.292 sec
>>> Running org.apache.hadoop.hbase.regionserver.TestScanWildcardColumnTracker
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.119 sec
>>> Running org.apache.hadoop.hbase.regionserver.TestKeyValueSkipListSet
>>> Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.023 sec
>>> Running org.apache.hadoop.hbase.filter.TestPageFilter
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
>>> Running org.apache.hadoop.hbase.io.hfile.TestSeekTo
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.215 sec
>>> Running org.apache.hadoop.hbase.filter.TestColumnPaginationFilter
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec
>>> Running org.apache.hadoop.hbase.rest.TestStatusResource
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 24.427 sec
>>> Running org.apache.hadoop.hbase.executor.TestExecutorService
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.049 sec
>>> Running org.apache.hadoop.hbase.client.TestFromClientSide
>>> Tests run: 40, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 337.378 sec
>>> Running org.apache.hadoop.hbase.replication.TestReplication
>>> Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 179.535 sec
>>> Running org.apache.hadoop.hbase.regionserver.TestCompaction
>>> Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 140.706 sec
>>> Running org.apache.hadoop.hbase.filter.TestSingleColumnValueFilter
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.035 sec
>>> Running org.apache.hadoop.hbase.mapreduce.TestTimeRangeMapRed
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.827 sec
>>> Running org.apache.hadoop.hbase.zookeeper.TestZooKeeperMainServerArg
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.032 sec
>>> Running org.apache.hadoop.hbase.mapreduce.TestSimpleTotalOrderPartitioner
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.118 sec
>>> Running org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
>>> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 149.01 sec
>>> Running org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed
>>> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.199 sec
>>> Running org.apache.hadoop.hbase.client.replication.TestReplicationAdmin
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.387 sec
>>> Running org.apache.hadoop.hbase.regionserver.TestScanDeleteTracker
>>> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.185 sec
>>> Running org.apache.hadoop.hbase.client.TestMetaScanner
>>> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.817 sec
>>> Running org.apache.hadoop.hbase.metrics.TestMetricsMBeanBase
>>> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.032 sec
>>> Running org.apache.hadoop.hbase.TestRegionRebalancing
>>> killed.
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [ERROR] BUILD ERROR
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Error while executing forked tests.; nested exception is
>>> org.apache.maven.surefire.booter.shade.org.codehaus.plexus.util.cli.CommandLineException:
>>> Error while executing external command, process killed.
>>>
>>> Process timeout out after 900 seconds
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] For more information, run Maven with the -e switch
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Total time: 43 minutes 36 seconds
>>> [INFO] Finished at: Tue Nov 09 02:30:23 UTC 2010
>>> [INFO] Final Memory: 59M/487M
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [locks-and-latches] Releasing all the locks
>>> [locks-and-latches] All the locks released
>>> Archiving artifacts
>>> Recording test results
>>>
>>>
>>
>