James, Ah, didn't notice that timeouts are not shown in the final report as failures. It seems that the build is using JDK 1.7 and test run OOM with PermGen space. Fixed in PHOENIX-2879
Thanks, Sergey On Wed, May 4, 2016 at 1:48 PM, James Taylor <[email protected]> wrote: > Sergey, on master branch (which is HBase 1.2): > https://builds.apache.org/job/Phoenix-master/1214/console > > On Wed, May 4, 2016 at 1:31 PM, Sergey Soldatov <[email protected]> > wrote: >> >> James, >> Regarding HivePhoenixStoreIT. Are you talking about >> Phoenix-4.x-HBase-1.0 job? Last build passed it successfully. >> >> >> On Wed, May 4, 2016 at 10:15 AM, James Taylor <[email protected]> >> wrote: >> > Our Jenkins builds have improved, but we're seeing some issues: >> > - timeouts with the new org.apache.phoenix.hive.HivePhoenixStoreIT test. >> > - consistent failure with 4.x-HBase-1.1 build. I suspect that Jenkins >> > build >> > is out-of-date, as we haven't had a 4.x-HBase-1.1 branch for quite a >> > while. >> > There's likely some changes that were made to the other Jenkins build >> > scripts that weren't made to this one >> > - flapping of >> > the >> > org.apache.phoenix.end2end.index.ReadOnlyIndexFailureIT.testWriteFailureReadOnlyIndex >> > test in 0.98 and 1.0 >> > - no email sent for 0.98 build (as far as I can tell) >> > >> > If folks have time to look into these, that'd be much appreciated. >> > >> > James >> > >> > >> > >> > On Sat, Apr 30, 2016 at 11:55 AM, James Taylor <[email protected]> >> > wrote: >> > >> >> The defaults when tests are running are much lower than the standard >> >> Phoenix defaults (see QueryServicesTestImpl and >> >> BaseTest.setUpConfigForMiniCluster()). It's unclear to me why the >> >> HashJoinIT and SortMergeJoinIT tests (I think these are the culprits) >> >> do >> >> not seem to adhere to these (or maybe override them?). They fail for me >> >> on >> >> my Mac, but they do pass on a Linux box. Would be awesome if someone >> >> could >> >> investigate and submit a patch to fix these. >> >> >> >> Thanks, >> >> James >> >> >> >> On Sat, Apr 30, 2016 at 11:47 AM, Nick Dimiduk <[email protected]> >> >> wrote: >> >> >> >>> The default thread pool sizes for HDFS, HBase, ZK, and the Phoenix >> >>> client >> >>> are all contributing to this huge thread count. >> >>> >> >>> A good starting point would be to take a jstack of the IT process and >> >>> count, group by threads with similar name. Reconfigure to reduce all >> >>> those >> >>> groups to something like 10 each, see if the test still runs reliably >> >>> on >> >>> local hardware. >> >>> >> >>> On Friday, April 29, 2016, Sergey Soldatov <[email protected]> >> >>> wrote: >> >>> >> >>> > but the way, we need to do something with those OOMs and "unable to >> >>> > create new native thread" in ITs. It's quite strange to see in 10 >> >>> > lines test such kind of failures. Especially when queries for table >> >>> > with less than 10 rows generate over 2500 threads. Does anybody know >> >>> > whether it's zk related issue? >> >>> > >> >>> > On Fri, Apr 29, 2016 at 7:51 AM, James Taylor >> >>> > <[email protected] >> >>> > <javascript:;>> wrote: >> >>> > > A patch would be much appreciated, Sergey. >> >>> > > >> >>> > > On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov < >> >>> > [email protected] <javascript:;>> >> >>> > > wrote: >> >>> > > >> >>> > >> As for flume module - flume-ng is coming with commons-io 2.1 >> >>> > >> while >> >>> > >> hadoop & hbase require org.apache.commons.io.Charsets which was >> >>> > >> introduced in 2.3. Easy way is to move dependency on flume-ng >> >>> > >> after >> >>> > >> the dependencies on hbase/hadoop. >> >>> > >> >> >>> > >> The last thing about ConcurrentHashMap - it definitely means that >> >>> > >> the >> >>> > >> code was compiled with 1.8 since 1.7 returns a simple Set while >> >>> > >> 1.8 >> >>> > >> returns KeySetView >> >>> > >> >> >>> > >> >> >>> > >> >> >>> > >> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <[email protected] >> >>> > <javascript:;>> wrote: >> >>> > >> > *tl;dr* >> >>> > >> > >> >>> > >> > * I'm removing ubuntu-us1 from all pools >> >>> > >> > * Phoenix-Flume ITs look busted >> >>> > >> > * UpsertValuesIT looks busted >> >>> > >> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in its >> >>> > entirety. >> >>> > >> > >> >>> > >> > Details below... >> >>> > >> > >> >>> > >> > It looks like we have a bunch of different reasons for the >> >>> failures. >> >>> > >> > Starting with Phoenix-master: >> >>> > >> > >> >>> > >> >>>> >> >>> > >> > org.apache.phoenix.schema.NewerTableAlreadyExistsException: >> >>> > >> > ERROR >> >>> 1013 >> >>> > >> > (42M04): Table already exists. tableName=T >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476) >> >>> > >> > <<< >> >>> > >> > >> >>> > >> > I've seen this coming out of a few different tests (I think >> >>> > >> > I've >> >>> also >> >>> > run >> >>> > >> > into it on my own, but that's another thing) >> >>> > >> > >> >>> > >> > Some of them look like the Jenkins build host is just >> >>> > >> > over-taxed: >> >>> > >> > >> >>> > >> >>>> >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO: >> >>> > >> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed; >> >>> > error='Cannot >> >>> > >> > allocate memory' (errno=12) >> >>> > >> > # >> >>> > >> > # There is insufficient memory for the Java Runtime Environment >> >>> > >> > to >> >>> > >> continue. >> >>> > >> > # Native memory allocation (malloc) failed to allocate >> >>> > >> > 331350016 >> >>> bytes >> >>> > >> for >> >>> > >> > committing reserved memory. >> >>> > >> > # An error report file with more information is saved as: >> >>> > >> > # >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log >> >>> > >> > Java HotSpot(TM) 64-Bit Server VM warning: INFO: >> >>> > >> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed; >> >>> > error='Cannot >> >>> > >> > allocate memory' (errno=12) >> >>> > >> > # >> >>> > >> > <<< >> >>> > >> > >> >>> > >> > and >> >>> > >> > >> >>> > >> >>>> >> >>> > >> > ------------------------------------------------------- >> >>> > >> > T E S T S >> >>> > >> > ------------------------------------------------------- >> >>> > >> > Build step 'Invoke top-level Maven targets' marked build as >> >>> > >> > failure >> >>> > >> > <<< >> >>> > >> > >> >>> > >> > Both of these issues are limited to the host "ubuntu-us1". Let >> >>> > >> > me >> >>> just >> >>> > >> > remove him from the pool (on Phoenix-master) and see if that >> >>> > >> > helps >> >>> at >> >>> > >> all. >> >>> > >> > >> >>> > >> > I also see some sporadic failures of some Flume tests >> >>> > >> > >> >>> > >> >>>> >> >>> > >> > Running org.apache.phoenix.flume.PhoenixSinkIT >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: >> >>> 0.004 >> >>> > sec >> >>> > >> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT >> >>> > >> > org.apache.phoenix.flume.PhoenixSinkIT Time elapsed: 0.004 sec >> >>> <<< >> >>> > >> ERROR! >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save >> >>> > >> > in >> >>> any >> >>> > >> > storage directories while saving namespace. >> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage >> >>> > directories >> >>> > >> > while saving namespace. >> >>> > >> > >> >>> > >> > Running org.apache.phoenix.flume.RegexEventSerializerIT >> >>> > >> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: >> >>> 0.005 >> >>> > sec >> >>> > >> > <<< FAILURE! - in >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT >> >>> > >> > org.apache.phoenix.flume.RegexEventSerializerIT Time elapsed: >> >>> 0.004 >> >>> > sec >> >>> > >> > <<< ERROR! >> >>> > >> > java.lang.RuntimeException: java.io.IOException: Failed to save >> >>> > >> > in >> >>> any >> >>> > >> > storage directories while saving namespace. >> >>> > >> > Caused by: java.io.IOException: Failed to save in any storage >> >>> > directories >> >>> > >> > while saving namespace. >> >>> > >> > <<< >> >>> > >> > >> >>> > >> > I'm not sure what the error message means at a glance. >> >>> > >> > >> >>> > >> > For Phoenix-HBase-1.1: >> >>> > >> > >> >>> > >> >>>> >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException: >> >>> > >> java.lang.NoSuchMethodError: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156) >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) >> >>> > >> > at >> >>> > >> > >> >>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) >> >>> > >> > at java.lang.Thread.run(Thread.java:745) >> >>> > >> > Caused by: java.lang.NoSuchMethodError: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615) >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117) >> >>> > >> > ... 4 more >> >>> > >> > 2016-04-28 22:54:35,497 WARN [RS:0;hemera:41302] >> >>> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279): error >> >>> > telling >> >>> > >> > master we are up >> >>> > >> > com.google.protobuf.ServiceException: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException: >> >>> > >> java.lang.NoSuchMethodError: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156) >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) >> >>> > >> > at >> >>> > >> > >> >>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) >> >>> > >> > at java.lang.Thread.run(Thread.java:745) >> >>> > >> > Caused by: java.lang.NoSuchMethodError: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615) >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117) >> >>> > >> > ... 4 more >> >>> > >> > >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140) >> >>> > >> > at java.security.AccessController.doPrivileged(Native >> >>> Method) >> >>> > >> > at javax.security.auth.Subject.doAs(Subject.java:356) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138) >> >>> > >> > at java.lang.Thread.run(Thread.java:745) >> >>> > >> > Caused by: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException): >> >>> > >> > org.apache.hadoop.hbase.DoNotRetryIOException: >> >>> > >> java.lang.NoSuchMethodError: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156) >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133) >> >>> > >> > at >> >>> > >> > >> >>> > >> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108) >> >>> > >> > at java.lang.Thread.run(Thread.java:745) >> >>> > >> > Caused by: java.lang.NoSuchMethodError: >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615) >> >>> > >> > at >> >>> > >> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117) >> >>> > >> > ... 4 more >> >>> > >> > >> >>> > >> > at >> >>> > >> > >> >>> > >> >>> > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235) >> >>> > >> > at >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217) >> >>> > >> > ... 13 more >> >>> > >> > <<< >> >>> > >> > >> >>> > >> > We have hit-or-miss on this error message which keeps >> >>> hbase:namespace >> >>> > >> from >> >>> > >> > being assigned (as the RS's can never report into the hmaster). >> >>> This >> >>> > is >> >>> > >> > happening across a couple of the nodes (ubuntu-[3,4,6]). I had >> >>> tried >> >>> > to >> >>> > >> look >> >>> > >> > into this one over the weekend (and was lead to a JDK8 built >> >>> > >> > jar, >> >>> > >> running on >> >>> > >> > JDK7), but if I look at META-INF/MANIFEST.mf in the >> >>> > >> hbase-server-1.1.3.jar >> >>> > >> > from central, I see it was built with 1.7.0_80 (which I think >> >>> > >> > means >> >>> > the >> >>> > >> JDK8 >> >>> > >> > thought is a red-herring). I'm really confused by this one, >> >>> actually. >> >>> > >> > Something must be amiss here. >> >>> > >> > >> >>> > >> > For Phoenix-HBase-1.0: >> >>> > >> > >> >>> > >> > We see the same Phoenix-Flume failures, UpsertValuesIT failure, >> >>> > >> > and >> >>> > >> timeouts >> >>> > >> > on ubuntu-us1. There is one crash on H10, but that might just >> >>> > >> > be >> >>> bad >> >>> > >> luck. >> >>> > >> > >> >>> > >> > For Phoenix-HBase-0.98: >> >>> > >> > >> >>> > >> > Same UpsertValuesIT failure and failures on ubuntu-us1. >> >>> > >> > >> >>> > >> > >> >>> > >> > James Taylor wrote: >> >>> > >> >> >> >>> > >> >> Anyone know why our Jenkins builds keep failing? Is it >> >>> environmental >> >>> > and >> >>> > >> >> is >> >>> > >> >> there anything we can do about it? >> >>> > >> >> >> >>> > >> >> Thanks, >> >>> > >> >> James >> >>> > >> >> >> >>> > >> > >> >>> > >> >> >>> > >> >>> >> >> >> >> > >
