Can someone take a look at running Test Big Linked for > 5 iterations with slowDeterministic chaos monkey on a distributed cluster. I'm pretty concerned about HBASE-9338
On Tue, Sep 3, 2013 at 6:57 AM, Jean-Marc Spaggiari <[email protected]> wrote: > There was a typo in my log4j.properties :( > > So it's working fine. > > The only INFO logs I still see are those one: > 2013-09-03 09:53:07,313 INFO [M:0;t430s:45176] mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2013-09-03 09:53:07,350 INFO [M:0;t430s:45176] mortbay.log: jetty-6.1.26 > But there is only very few of them. > > Performances wise, here are the numbers (the higher, the better. Rows per > seconds, expect for scans where it's rows/min). As you will see, 0.96 is > slower only for RandomSeekScanTest (way slower) and RandomScanWithRange10 > but is faster for everything else. I ran the tests with the default > settings. So I think we should look at RandomSeekScanTest but expect this > one, everything else is pretty good. > > Also, I have been able to reproduce this exception: > 2013-09-03 09:55:36,718 WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] > server.NIOServerCnxn: caught end of stream exception > EndOfStreamException: Unable to read additional data from client sessionid > 0x140e4191edb0009, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > at java.lang.Thread.run(Thread.java:662) > > Just had to run PE and kill it in the middle. > > JM > > > > 0.96.0RC0 0.94.11 > org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest 10.28 10.17 > 101.12% org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest 966.01 > 810.58 119.18% > org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 98.50 > 255.71 38.52% org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest > 39251.17 25682.11 152.83% > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange10Test > 25844.88 28715.29 90.00% > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test > 20029.48 18022.39 111.14% > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange1000Test > 2692.16 2346.85 114.71% > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialReadTest 3002.18 > 2875.83 104.39% > org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 38995.50 > 26693.23 146.09% > > > 2013/9/3 Stack <[email protected]> > >> On Mon, Sep 2, 2013 at 10:51 AM, Jean-Marc Spaggiari < >> [email protected]> wrote: >> >> > I have created: >> > HBASE-9412 >> > HBASE-9413 >> > HBASE-9414 >> > >> > I have not been able yet to reproduce the ZK error. I'm trying. >> > >> > >> Is it when you have a shell connection and then kill it? >> >> >> >> > Last, I tried, with no success, to set loglevel to WARN to remove all >> > DEBUG and INFO logs. Setting it to WARN remove the DEBUG lines, but I >> > keep getting the INFO. Seems that something is setting the log level >> > somewhere else, or it's not read. >> > >> > Here is my log4j.properties file. I removed all the customed log level >> > to setup WARN for org.apache. And it's still showing INFO... >> > >> > >> >> You did it by editing log4j and restarting or in the UI? I think the UI >> log level setting is broke.... (new issue!) >> >> Thanks for trying it out JMS, >> >> So everything is slower in 0.96? >> St.Ack >> >> >> >> > JM >> > >> > >> > # Define some default values that can be overridden by system properties >> > hbase.root.logger=WARN,console >> > hbase.security.logger=WARN,console >> > hbase.log.dir=. >> > hbase.log.file=hbase.log >> > >> > # Define the root logger to the system property "hbase.root.logger". >> > log4j.rootLogger=${hbase.root.logger} >> > >> > # Logging Threshold >> > log4j.threshold=ALL >> > >> > # >> > # Daily Rolling File Appender >> > # >> > log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender >> > log4j.appender.DRFA.File=${hbase.log.dir}/${hbase.log.file} >> > >> > # Rollver at midnight >> > log4j.appender.DRFA.DatePattern=.yyyy-MM-dd >> > >> > # 30-day backup >> > #log4j.appender.DRFA.MaxBackupIndex=30 >> > log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout >> > >> > # Pattern format: Date LogLevel LoggerName LogMessage >> > log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t] %c{2}: >> > %m%n >> > >> > # Rolling File Appender properties >> > hbase.log.maxfilesize=256MB >> > hbase.log.maxbackupindex=20 >> > >> > # Rolling File Appender >> > log4j.appender.RFA=org.apache.log4j.RollingFileAppender >> > log4j.appender.RFA.File=${hbase.log.dir}/${hbase.log.file} >> > >> > log4j.appender.RFA.MaxFileSize=${hbase.log.maxfilesize} >> > log4j.appender.RFA.MaxBackupIndex=${hbase.log.maxbackupindex} >> > >> > log4j.appender.RFA.layout=org.apache.log4j.PatternLayout >> > log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p [%t] %c{2}: >> > %m%n >> > >> > # >> > # Security audit appender >> > # >> > hbase.security.log.file=SecurityAuth.audit >> > hbase.security.log.maxfilesize=256MB >> > hbase.security.log.maxbackupindex=20 >> > log4j.appender.RFAS=org.apache.log4j.RollingFileAppender >> > log4j.appender.RFAS.File=${hbase.log.dir}/${hbase.security.log.file} >> > log4j.appender.RFAS.MaxFileSize=${hbase.security.log.maxfilesize} >> > log4j.appender.RFAS.MaxBackupIndex=${hbase.security.log.maxbackupindex} >> > log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout >> > log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n >> > log4j.category.SecurityLogger=${hbase.security.logger} >> > log4j.additivity.SecurityLogger=false >> > >> > >> #log4j.logger.SecurityLogger.org.apache.hadoop.hbase.security.access.AccessController=TRACE >> > >> > # >> > # Null Appender >> > # >> > log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender >> > >> > # >> > # console >> > # Add "console" to rootlogger above if you want to use this >> > # >> > log4j.appender.console=org.apache.log4j.ConsoleAppender >> > log4j.appender.console.target=System.err >> > log4j.appender.console.layout=org.apache.log4j.PatternLayout >> > log4j.appender.console.layout.ConversionPattern=%d{ISO8601} %-5p [%t] >> > %c{2}: %m%n >> > >> > # Custom Logging levels >> > >> > log4j.logger.ore.apache=WARN >> > >> > 2013/9/2 Jean-Marc Spaggiari <[email protected]>: >> > > Hi St.Ack, >> > > >> > > I will open the relate JIRAs in few minutes. >> > > >> > > Regarding performances, RandomSeekScanTest is way slower, and only >> > > RandomScanWithRange100Test was faster. Others were similar. For >> > > RandomScanWithRange100Test I suspect that I don't have the right >> > > number for 0.94 so 0.94.11 tests are running right now on the same >> > > server with the same configuration. I will start to have numbers by >> > > end of day, else, tomorrow morning, but will most probably take about >> > > 24h to get all of them. >> > > >> > > JM >> > > >> > > 2013/9/2 Stack <[email protected]>: >> > >> On Mon, Sep 2, 2013 at 9:41 AM, Jean-Marc Spaggiari < >> > [email protected] >> > >>> wrote: >> > >> >> > >>> Got it. >> > >>> >> > >>> I can't run the integration tests for now because I'm lacking some >> > >>> servers :( Need to complete some HBase on RAID tests before I can get >> > >>> those new servers >> > >>> >> > >>> First thing is, start-hbase.cmd has the execute flag set. I don't >> > >>> think it's required. And it will help with tabulation feature if we >> > >>> can un-set it. >> > >>> >> > >>> For 0.96.0RC0 here are my results: >> > >>> First, I get 2 .out files. each time I start the server, instead of >> > >>> usually one... With the same timestamp. >> > >>> >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari 0 Aug 31 15:38 >> > >>> hbase-jmspaggiari-master-t430s.out >> > >>> -rw-r--r-- 1 jmspaggiari jmspaggiari 0 Aug 31 15:38 >> > >>> hbase-jmspaggiari-master-t430s.out.1 >> > >>> >> > >>> >> > >>> In the UI, we say "The .META. table holds references to all User >> Table >> > >>> regions" but the table name is "hbase:meta" and not ".META." >> > >>> >> > >>> On the logs, I found this exception that I did not had before: >> > >>> 2013-08-31 18:45:05,490 WARN >> > >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] server.NIOServerCnxn: >> > >>> caught end of stream exception >> > >>> EndOfStreamException: Unable to read additional data from client >> > >>> sessionid 0x140d68bb9d50004, likely client has closed socket >> > >>> at >> > >>> >> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) >> > >>> at >> > >>> >> > >> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) >> > >>> at java.lang.Thread.run(Thread.java:662) >> > >>> >> > >>> >> > >>> I ran PE over 48h. I don't have another 0.96 baseline to compare >> with, >> > >>> so I compared with 0.94. >> > >>> org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest is >> > >>> about 3 times slower with 0.96. >> > >>> >> > org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test >> > >>> seems to be 2 times faster. >> > >>> >> > >>> Writes also are faster but I changed my hard drive since I ran 0.94. >> > >>> So I'm currently re-running 0.94 and will provide another more >> > >>> accurate comparison soon. >> > >>> >> > >>> Ran LoadTestTool: Failed to write keys: 0 >> > >>> >> > >>> I have not been able to run IntegrationTestLoadAndVerify nor >> > >>> IntegrationTestBigLinkedList because of the lack of servers. >> > >>> >> > >>> So overall, it seems to be working fine, but I have not been able to >> > >>> test this release as deeply as I'm usually testing the 0.94 releases. >> > >>> >> > >> >> > >> Thank you for taking it for a spin JMS -- you the man. If you don't >> > open >> > >> issues for the teething stuff, I will. >> > >> >> > >> So 0.96 is slower doing RandomSeekScanTest but faster on your other >> > tests? >> > >> I can take a look too. >> > >> >> > >> St.Ack >> > >>
