*tl;dr*
* I'm removing ubuntu-us1 from all pools
* Phoenix-Flume ITs look busted
* UpsertValuesIT looks busted
* Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in its entirety.
Details below...
It looks like we have a bunch of different reasons for the failures.
Starting with Phoenix-master:
>>>
org.apache.phoenix.schema.NewerTableAlreadyExistsException: ERROR 1013
(42M04): Table already exists. tableName=T
at
org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
<<<
I've seen this coming out of a few different tests (I think I've also
run into it on my own, but that's another thing)
Some of them look like the Jenkins build host is just over-taxed:
>>>
Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000007e7600000, 331350016, 0) failed;
error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 331350016 bytes
for committing reserved memory.
# An error report file with more information is saved as:
#
/home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000007ea600000, 273678336, 0) failed;
error='Cannot allocate memory' (errno=12)
#
<<<
and
>>>
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Build step 'Invoke top-level Maven targets' marked build as failure
<<<
Both of these issues are limited to the host "ubuntu-us1". Let me just
remove him from the pool (on Phoenix-master) and see if that helps at all.
I also see some sporadic failures of some Flume tests
>>>
Running org.apache.phoenix.flume.PhoenixSinkIT
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.004
sec <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
org.apache.phoenix.flume.PhoenixSinkIT Time elapsed: 0.004 sec <<< ERROR!
java.lang.RuntimeException: java.io.IOException: Failed to save in any
storage directories while saving namespace.
Caused by: java.io.IOException: Failed to save in any storage
directories while saving namespace.
Running org.apache.phoenix.flume.RegexEventSerializerIT
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.005
sec <<< FAILURE! - in org.apache.phoenix.flume.RegexEventSerializerIT
org.apache.phoenix.flume.RegexEventSerializerIT Time elapsed: 0.004 sec
<<< ERROR!
java.lang.RuntimeException: java.io.IOException: Failed to save in any
storage directories while saving namespace.
Caused by: java.io.IOException: Failed to save in any storage
directories while saving namespace.
<<<
I'm not sure what the error message means at a glance.
For Phoenix-HBase-1.1:
>>>
org.apache.hadoop.hbase.DoNotRetryIOException:
java.lang.NoSuchMethodError:
java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError:
java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
at
org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
at
org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
at
org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
at
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
at
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
... 4 more
2016-04-28 22:54:35,497 WARN [RS:0;hemera:41302]
org.apache.hadoop.hbase.regionserver.HRegionServer(2279): error telling
master we are up
com.google.protobuf.ServiceException:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
org.apache.hadoop.hbase.DoNotRetryIOException:
java.lang.NoSuchMethodError:
java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError:
java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
at
org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
at
org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
at
org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
at
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
at
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
... 4 more
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
at
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
at
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
at
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
at
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
at
org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
at
org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
at java.lang.Thread.run(Thread.java:745)
Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
org.apache.hadoop.hbase.DoNotRetryIOException:
java.lang.NoSuchMethodError:
java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodError:
java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
at
org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
at
org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
at
org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
at
org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
at
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
... 4 more
at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
... 13 more
<<<
We have hit-or-miss on this error message which keeps hbase:namespace
from being assigned (as the RS's can never report into the hmaster).
This is happening across a couple of the nodes (ubuntu-[3,4,6]). I had
tried to look into this one over the weekend (and was lead to a JDK8
built jar, running on JDK7), but if I look at META-INF/MANIFEST.mf in
the hbase-server-1.1.3.jar from central, I see it was built with
1.7.0_80 (which I think means the JDK8 thought is a red-herring). I'm
really confused by this one, actually. Something must be amiss here.
For Phoenix-HBase-1.0:
We see the same Phoenix-Flume failures, UpsertValuesIT failure, and
timeouts on ubuntu-us1. There is one crash on H10, but that might just
be bad luck.
For Phoenix-HBase-0.98:
Same UpsertValuesIT failure and failures on ubuntu-us1.
James Taylor wrote:
Anyone know why our Jenkins builds keep failing? Is it environmental and is
there anything we can do about it?
Thanks,
James