[ 
https://issues.apache.org/jira/browse/HBASE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-23919:
----------------------------------
    Component/s: flakies
    Description: 
I've seen this on occasion across different hardwares; the standalone zookeeper 
won't come up and then random unit test fails in its junit startup phase as 
part of launching mini cluster.

I've been trying to track this w/ a while now adding in logging, using more of 
zk client instead of the copy/paste we've had a long while now, and adding in 
logging (I've been running locally w/ zk logging set to INFO).

It looks like this currently where the last thing out of the server launch is 
this....
{code:java}

  2020-03-02 07:57:46,129 INFO  [Time-limited test] 
server.ZooKeeperServer(854): maxSessionTimeout set to -1                        
                                                                                
                    2020-03-02 07:57:46,139 INFO  [Time-limited test] 
server.NIOServerCnxnFactory(89): binding to port 0.0.0.0/0.0.0.0:49316          
                                                                                
                    2020-03-02 07:57:46,181 INFO  [Time-limited test] 
zookeeper.MiniZooKeeperCluster(256): Started connectionTimeout=30000, 
dir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/
      cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0, 
clientPort=49316, 
dataDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/
 zookeeper_0/version-2, 
dataLogDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0/version-2,
 tickTime=2000,              maxClientCnxns=300, minSessionTimeout=4000, 
maxSessionTimeout=40000, serverId=0{code}
... then the client just does this over and over:
{code:java}
 2020-03-02 07:57:46,182 INFO  [Time-limited test] 
client.FourLetterWordMain(65): connecting to localhost 49316                    
                                                                                
                    2020-03-02 07:57:46,213 INFO  [Time-limited test] 
zookeeper.MiniZooKeeperCluster(453): localhost:49316 not up                     
                                                                                
                    java.net.SocketException: Connection reset                  
                                                                                
                                                                                
            at java.net.SocketInputStream.read(SocketInputStream.java:209)      
                                                                                
                                                                                
  at java.net.SocketInputStream.read(SocketInputStream.java:141)                
                                                                                
                                                                        at 
sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)                      
                                                                                
                                                                   at 
sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)                       
                                                                                
                                                                   at 
sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)                           
                                                                                
                                                                   at 
java.io.InputStreamReader.read(InputStreamReader.java:184)                      
                                                                                
                                                                   at 
java.io.BufferedReader.fill(BufferedReader.java:161)                            
                                                                                
                                                                   at 
java.io.BufferedReader.readLine(BufferedReader.java:324)                        
                                                                                
                                                                   at 
java.io.BufferedReader.readLine(BufferedReader.java:389)                        
                                                                                
                                                                   at 
org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:84)
                                                                                
                                                         at 
org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.waitForServerUp(MiniZooKeeperCluster.java:442)
                                                                                
                                              at 
org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:259)
                                                                                
                                                      at 
org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:130)
                                                                                
                                                   at 
org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:103)
                                                                                
                                                   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1107)
                                                                                
                                                        at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1065)
                                                                                
                                                        at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62)
                                                                                
                           at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                     
                                                                                
                                                                   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
                                                                                
                                                                   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                                                                                
                                                              at 
java.lang.reflect.Method.invoke(Method.java:498)                                
                                                                                
                                                                   at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
                                                                                
                                                               at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
                                                                                
                                                                at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
                                                                                
                                                                 at 
org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
                                                                                
                                                                  at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)   
                                                                                
                                                                   at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)     
                                                                                
                                                                   at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
                                                                                
                                                 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
                                                                                
                                                 at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)                        
                                                                                
                                                                   at 
java.lang.Thread.run(Thread.java:745)
 {code}
 

There is ZOOKEEPER-2714 but it is closed as not-reproducible and it looks a 
little different going by the log emissions in that it registers the client 
connections.

Noting this here in case others have seen similar or there are ideas out there 
for how to fix.

Here for example is how the failure might look high-level:
{code:java}

 -------------------------------------------------------------------------------
 Test set: 
org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable
 -------------------------------------------------------------------------------
 Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.017 s <<< 
FAILURE! - in 
org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable
 org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable  Time 
elapsed: 0.01 s  <<< ERROR!
 java.io.IOException: Waiting for startup of standalone server; server 
isRunning=true
   at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62)
 {code}
... and then when you dig in, the test never even made it past cluster launch 
in setup. 

> [Flakey Test] Standalone Zookeeper won't start (minizookeepercluster won't 
> come up)
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-23919
>                 URL: https://issues.apache.org/jira/browse/HBASE-23919
>             Project: HBase
>          Issue Type: Bug
>          Components: flakies
>            Reporter: Michael Stack
>            Priority: Major
>
> I've seen this on occasion across different hardwares; the standalone 
> zookeeper won't come up and then random unit test fails in its junit startup 
> phase as part of launching mini cluster.
> I've been trying to track this w/ a while now adding in logging, using more 
> of zk client instead of the copy/paste we've had a long while now, and adding 
> in logging (I've been running locally w/ zk logging set to INFO).
> It looks like this currently where the last thing out of the server launch is 
> this....
> {code:java}
>   2020-03-02 07:57:46,129 INFO  [Time-limited test] 
> server.ZooKeeperServer(854): maxSessionTimeout set to -1                      
>                                                                               
>                         2020-03-02 07:57:46,139 INFO  [Time-limited test] 
> server.NIOServerCnxnFactory(89): binding to port 0.0.0.0/0.0.0.0:49316        
>                                                                               
>                         2020-03-02 07:57:46,181 INFO  [Time-limited test] 
> zookeeper.MiniZooKeeperCluster(256): Started connectionTimeout=30000, 
> dir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/
>       cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0, 
> clientPort=49316, 
> dataDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/
>  zookeeper_0/version-2, 
> dataLogDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0/version-2,
>  tickTime=2000,              maxClientCnxns=300, minSessionTimeout=4000, 
> maxSessionTimeout=40000, serverId=0{code}
> ... then the client just does this over and over:
> {code:java}
>  2020-03-02 07:57:46,182 INFO  [Time-limited test] 
> client.FourLetterWordMain(65): connecting to localhost 49316                  
>                                                                               
>                         2020-03-02 07:57:46,213 INFO  [Time-limited test] 
> zookeeper.MiniZooKeeperCluster(453): localhost:49316 not up                   
>                                                                               
>                         java.net.SocketException: Connection reset            
>                                                                               
>                                                                               
>                       at 
> java.net.SocketInputStream.read(SocketInputStream.java:209)                   
>                                                                               
>                                                                        at 
> java.net.SocketInputStream.read(SocketInputStream.java:141)                   
>                                                                               
>                                                                        at 
> sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)                    
>                                                                               
>                                                                        at 
> sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)                     
>                                                                               
>                                                                        at 
> sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)                         
>                                                                               
>                                                                        at 
> java.io.InputStreamReader.read(InputStreamReader.java:184)                    
>                                                                               
>                                                                        at 
> java.io.BufferedReader.fill(BufferedReader.java:161)                          
>                                                                               
>                                                                        at 
> java.io.BufferedReader.readLine(BufferedReader.java:324)                      
>                                                                               
>                                                                        at 
> java.io.BufferedReader.readLine(BufferedReader.java:389)                      
>                                                                               
>                                                                        at 
> org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:84)
>                                                                               
>                                                            at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.waitForServerUp(MiniZooKeeperCluster.java:442)
>                                                                               
>                                                 at 
> org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:259)
>                                                                               
>                                                         at 
> org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:130)
>                                                                               
>                                                      at 
> org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:103)
>                                                                               
>                                                      at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1107)
>                                                                               
>                                                           at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1065)
>                                                                               
>                                                           at 
> org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62)
>                                                                               
>                              at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)                   
>                                                                               
>                                                                        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
>                                                                               
>                                                                        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                                                                               
>                                                                 at 
> java.lang.reflect.Method.invoke(Method.java:498)                              
>                                                                               
>                                                                        at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>                                                                               
>                                                                  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>                                                                               
>                                                                   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>                                                                               
>                                                                    at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>                                                                               
>                                                                     at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) 
>                                                                               
>                                                                        at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)   
>                                                                               
>                                                                        at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
>                                                                               
>                                                    at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
>                                                                               
>                                                    at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)                      
>                                                                               
>                                                                        at 
> java.lang.Thread.run(Thread.java:745)
>  {code}
>  
> There is ZOOKEEPER-2714 but it is closed as not-reproducible and it looks a 
> little different going by the log emissions in that it registers the client 
> connections.
> Noting this here in case others have seen similar or there are ideas out 
> there for how to fix.
> Here for example is how the failure might look high-level:
> {code:java}
>  
> -------------------------------------------------------------------------------
>  Test set: 
> org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable
>  
> -------------------------------------------------------------------------------
>  Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.017 s <<< 
> FAILURE! - in 
> org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable
>  org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable  Time 
> elapsed: 0.01 s  <<< ERROR!
>  java.io.IOException: Waiting for startup of standalone server; server 
> isRunning=true
>    at 
> org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62)
>  {code}
> ... and then when you dig in, the test never even made it past cluster launch 
> in setup. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to