Sorry but I'd call it "possible environment related problem" or "some feature may not work well in specific environment", rather than a flaky.
Will check against 1.4.7 released source package before opening any JIRA. Best Regards, Yu On Fri, 12 Apr 2019 at 11:37, Andrew Purtell <andrew.purt...@gmail.com> wrote: > And if they pass in my environment , then what should we call it then. I > have no doubt you are seeing failures. Therefore can you please file JIRAs > and attach information that can help identify a fix. Thanks. > > > On Apr 11, 2019, at 8:35 PM, Yu Li <car...@gmail.com> wrote: > > > > I ran the test suite with the -Dsurefire.rerunFailingTestsCount=2 option > > and on two different env separately, so it sums up to 6 times stable > > failure for each case, and from my perspective this is not flaky. > > > > IIRC last time when verifying 1.4.7 on the same env no such issue > observed, > > will double check. > > > > Best Regards, > > Yu > > > > > > On Fri, 12 Apr 2019 at 00:07, Andrew Purtell <andrew.purt...@gmail.com> > > wrote: > > > >> There are two failure cases it looks like. And this looks like flakes. > >> > >> The wrong FS assertions are not something I see when I run these tests > >> myself. I am not able to investigate something I can’t reproduce. What I > >> suggest is since you can reproduce do a git bisect to find the commit > that > >> introduced the problem. Then we can revert it. As an alternative we can > >> open a JIRA, report the problem, temporarily @ignore the test, and > >> continue. This latter option only should be done if we are fairly > confident > >> it is a test only problem. > >> > >> The connect exceptions are interesting. I see these sometimes when the > >> suite is executed, not this particular case, but when the failed test is > >> executed by itself it always passes. It is possible some change to > classes > >> related to the minicluster or startup or shutdown timing are the cause, > but > >> it is test time flaky behavior. I’m not happy about this but it doesn’t > >> actually fail the release because the failure is never repeatable when > the > >> test is run standalone. > >> > >> In general it would be great if some attention was paid to test > >> cleanliness on branch-1. As RM I’m not in a position to insist that > >> everything is perfect or there will never be another 1.x release, > certainly > >> not from branch-1. So, tests which fail repeatedly block a release IMHO > but > >> flakes do not. > >> > >> > >>> On Apr 10, 2019, at 11:20 PM, Yu Li <car...@gmail.com> wrote: > >>> > >>> -1 > >>> > >>> Observed many UT failures when checking the source package (tried > >> multiple > >>> rounds on two different environments, MacOs and Linux, got the same > >>> result), including (but not limited to): > >>> > >>> TestBulkload: > >>> > >> > shouldBulkLoadSingleFamilyHLog(org.apache.hadoop.hbase.regionserver.TestBulkLoad) > >>> Time elapsed: 0.083 s <<< ERROR! > >>> java.lang.IllegalArgumentException: Wrong FS: > >>> > >> > file:/var/folders/t6/vch4nh357f98y1wlq09lbm7h0000gn/T/junit1805329913454564189/junit8020757893576011944/data/default/shouldBulkLoadSingleFamilyHLog/8f4a6b584533de2fd1bf3c398dfaac29, > >>> expected: hdfs://localhost:55938 > >>> at > >>> > >> > org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamiliesAndSpecifiedTableName(TestBulkLoad.java:246) > >>> at > >>> > >> > org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamilies(TestBulkLoad.java:256) > >>> at > >>> > >> > org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldBulkLoadSingleFamilyHLog(TestBulkLoad.java:150) > >>> > >>> TestStoreFile: > >>> > >> > testCacheOnWriteEvictOnClose(org.apache.hadoop.hbase.regionserver.TestStoreFile) > >>> Time elapsed: 0.083 s <<< ERROR! > >>> java.net.ConnectException: Call From localhost/127.0.0.1 to > >> localhost:55938 > >>> failed on connection exception: java.net.ConnectException: Connection > >>> refused; For more details see: > >>> http://wiki.apache.org/hadoop/ConnectionRefused > >>> at > >>> > >> > org.apache.hadoop.hbase.regionserver.TestStoreFile.writeStoreFile(TestStoreFile.java:1047) > >>> at > >>> > >> > org.apache.hadoop.hbase.regionserver.TestStoreFile.testCacheOnWriteEvictOnClose(TestStoreFile.java:908) > >>> > >>> TestHFile: > >>> testEmptyHFile(org.apache.hadoop.hbase.io.hfile.TestHFile) Time > elapsed: > >>> 0.08 s <<< ERROR! > >>> java.net.ConnectException: Call From > >>> z05f06378.sqa.zth.tbsite.net/11.163.183.195 to localhost:35529 failed > on > >>> connection exception: java.net.ConnectException: Connection refused; > For > >>> more details see: http://wiki.apache.org/hadoop/ConnectionRefused > >>> at > >>> org.apache.hadoop.hbase.io > >> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90) > >>> Caused by: java.net.ConnectException: Connection refused > >>> at > >>> org.apache.hadoop.hbase.io > >> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90) > >>> > >>> TestBlocksScanned: > >>> > >> > testBlocksScannedWithEncoding(org.apache.hadoop.hbase.regionserver.TestBlocksScanned) > >>> Time elapsed: 0.069 s <<< ERROR! > >>> java.lang.IllegalArgumentException: Wrong FS: > hdfs://localhost:35529/tmp/ > >>> > >> > hbase-jueding.ly/hbase/data/default/TestBlocksScannedWithEncoding/a4a416cc3060d9820a621c294af0aa08 > >> , > >>> expected: file:/// > >>> at > >>> > >> > org.apache.hadoop.hbase.regionserver.TestBlocksScanned._testBlocksScanned(TestBlocksScanned.java:90) > >>> at > >>> > >> > org.apache.hadoop.hbase.regionserver.TestBlocksScanned.testBlocksScannedWithEncoding(TestBlocksScanned.java:86) > >>> > >>> And please let me know if any known issue I'm not aware of. Thanks. > >>> > >>> Best Regards, > >>> Yu > >>> > >>> > >>>> On Mon, 8 Apr 2019 at 11:38, Yu Li <car...@gmail.com> wrote: > >>>> > >>>> The performance report LGTM, thanks! (and sorry for the lag due to > >>>> Qingming Festival Holiday here in China) > >>>> > >>>> Still verifying the release, just some quick feedback: observed some > >>>> incompatible changes in compatibility report including > >>>> HBASE-21492/HBASE-21684 and worth a reminder in ReleaseNote. > >>>> > >>>> Irrelative but noticeable: the 1.4.9 release note URL is invalid on > >>>> https://hbase.apache.org/downloads.html > >>>> > >>>> Best Regards, > >>>> Yu > >>>> > >>>> > >>>>> On Fri, 5 Apr 2019 at 08:45, Andrew Purtell <apurt...@apache.org> > >> wrote: > >>>>> > >>>>> The difference is basically noise per the usual YCSB evaluation. > Small > >>>>> differences in workloads D and F (slightly worse) and workload E > >> (slightly > >>>>> better) that do not indicate serious regression. > >>>>> > >>>>> Linux version 4.14.55-62.37.amzn1.x86_64 > >>>>> c3.8xlarge x 5 > >>>>> OpenJDK Runtime Environment (build 1.8.0_181-shenandoah-b13) > >>>>> -Xms20g -Xmx20g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+UseNUMA > >>>>> -XX:-UseBiasedLocking -XX:+ParallelRefProcEnabled > >>>>> Hadoop 2.9.2 > >>>>> Init: Load 100 M rows and snapshot > >>>>> Run: Delete table, clone and redeploy from snapshot, run 10 M > >> operations > >>>>> Args: -threads 100 -target 50000 > >>>>> Test table: {NAME => 'u', BLOOMFILTER => 'ROW', VERSIONS => '1', > >> IN_MEMORY > >>>>> => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => > >>>>> 'ROW_INDEX_V1', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', > >> MIN_VERSIONS => > >>>>> '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => > >>>>> '0'} > >>>>> > >>>>> > >>>>> YCSB Workload A > >>>>> > >>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>> > >>>>> > >>>>> > >>>>> [OVERALL], RunTime(ms) 200592 200583 > >>>>> [OVERALL], Throughput(ops/sec) 49852 49855 > >>>>> [READ], AverageLatency(us) 544 559 > >>>>> [READ], MinLatency(us) 267 292 > >>>>> [READ], MaxLatency(us) 165631 185087 > >>>>> [READ], 95thPercentileLatency(us) 738 742 > >>>>> [READ], 99thPercentileLatency(us), 1877 1961 > >>>>> [UPDATE], AverageLatency(us) 1370 1181 > >>>>> [UPDATE], MinLatency(us) 702 646 > >>>>> [UPDATE], MaxLatency(us) 180735 177279 > >>>>> [UPDATE], 95thPercentileLatency(us) 1943 1652 > >>>>> [UPDATE], 99thPercentileLatency(us) 3257 3085 > >>>>> > >>>>> YCSB Workload B > >>>>> > >>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>> > >>>>> > >>>>> > >>>>> [OVERALL], RunTime(ms) 200599 200581 > >>>>> [OVERALL], Throughput(ops/sec) 49850 49855 > >>>>> [READ], AverageLatency(us), 454 471 > >>>>> [READ], MinLatency(us) 203 213 > >>>>> [READ], MaxLatency(us) 183423 174207 > >>>>> [READ], 95thPercentileLatency(us) 563 599 > >>>>> [READ], 99thPercentileLatency(us) 1360 1172 > >>>>> [UPDATE], AverageLatency(us) 1064 1029 > >>>>> [UPDATE], MinLatency(us) 746 726 > >>>>> [UPDATE], MaxLatency(us) 163455 101631 > >>>>> [UPDATE], 95thPercentileLatency(us) 1327 1157 > >>>>> [UPDATE], 99thPercentileLatency(us) 2241 1898 > >>>>> > >>>>> YCSB Workload C > >>>>> > >>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>> > >>>>> > >>>>> > >>>>> [OVERALL], RunTime(ms) 200541 200538 > >>>>> [OVERALL], Throughput(ops/sec) 49865 49865 > >>>>> [READ], AverageLatency(us) 332 327 > >>>>> [READ], MinLatency(us) 175 179 > >>>>> [READ], MaxLatency(us) 210559 170367 > >>>>> [READ], 95thPercentileLatency(us) 410 396 > >>>>> [READ], 99thPercentileLatency(us) 871 892 > >>>>> > >>>>> YCSB Workload D > >>>>> > >>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>> > >>>>> > >>>>> > >>>>> [OVERALL], RunTime(ms) 200579 200562 > >>>>> [OVERALL], Throughput(ops/sec) 49855 49859 > >>>>> [READ], AverageLatency(us) 487 547 > >>>>> [READ], MinLatency(us) 210 214 > >>>>> [READ], MaxLatency(us) 192255 177535 > >>>>> [READ], 95thPercentileLatency(us) 973 1529 > >>>>> [READ], 99thPercentileLatency(us) 1836 2683 > >>>>> [INSERT], AverageLatency(us) 1239 1152 > >>>>> [INSERT], MinLatency(us) 807 788 > >>>>> [INSERT], MaxLatency(us) 184575 148735 > >>>>> [INSERT], 95thPercentileLatency(us) 1496 1243 > >>>>> [INSERT], 99thPercentileLatency(us) 2965 2495 > >>>>> > >>>>> YCSB Workload E > >>>>> > >>>>> target 10k/op/s 1.4.9 1.5.0 > >>>>> > >>>>> > >>>>> > >>>>> [OVERALL], RunTime(ms) 100605 100568 > >>>>> [OVERALL], Throughput(ops/sec) 9939 9943 > >>>>> [SCAN], AverageLatency(us) 3548 2687 > >>>>> [SCAN], MinLatency(us) 696 678 > >>>>> [SCAN], MaxLatency(us) 1059839 238463 > >>>>> [SCAN], 95thPercentileLatency(us) 8327 6791 > >>>>> [SCAN], 99thPercentileLatency(us) 17647 14415 > >>>>> [INSERT], AverageLatency(us) 2688 1555 > >>>>> [INSERT], MinLatency(us) 887 815 > >>>>> [INSERT], MaxLatency(us) 173311 154623 > >>>>> [INSERT], 95thPercentileLatency(us) 4455 2571 > >>>>> [INSERT], 99thPercentileLatency(us) 9303 5375 > >>>>> > >>>>> YCSB Workload F > >>>>> > >>>>> target 50k/op/s 1.4.9 1.5.0 > >>>>> > >>>>> > >>>>> > >>>>> [OVERALL], RunTime(ms) 200562 204178 > >>>>> [OVERALL], Throughput(ops/sec) 49859 48976 > >>>>> [READ], AverageLatency(us) 856 1137 > >>>>> [READ], MinLatency(us) 262 257 > >>>>> [READ], MaxLatency(us) 205567 222335 > >>>>> [READ], 95thPercentileLatency(us) 2365 3475 > >>>>> [READ], 99thPercentileLatency(us) 3099 4143 > >>>>> [READ-MODIFY-WRITE], AverageLatency(us) 2559 2917 > >>>>> [READ-MODIFY-WRITE], MinLatency(us) 1100 1034 > >>>>> [READ-MODIFY-WRITE], MaxLatency(us) 208767 204799 > >>>>> [READ-MODIFY-WRITE], 95thPercentileLatency(us) 5747 7627 > >>>>> [READ-MODIFY-WRITE], 99thPercentileLatency(us) 7203 8919 > >>>>> [UPDATE], AverageLatency(us) 1700 1777 > >>>>> [UPDATE], MinLatency(us) 737 687 > >>>>> [UPDATE], MaxLatency(us) 97983 94271 > >>>>> [UPDATE], 95thPercentileLatency(us) 3377 4147 > >>>>> [UPDATE], 99thPercentileLatency(us) 4147 4831 > >>>>> > >>>>> > >>>>>> On Thu, Apr 4, 2019 at 1:14 AM Yu Li <car...@gmail.com> wrote: > >>>>>> > >>>>>> Thanks for the efforts boss. > >>>>>> > >>>>>> Since it's a new minor release, do we have performance comparison > >> report > >>>>>> with 1.4.9 as we did when releasing 1.4.0? If so, any reference? > Many > >>>>>> thanks! > >>>>>> > >>>>>> Best Regards, > >>>>>> Yu > >>>>>> > >>>>>> > >>>>>> On Thu, 4 Apr 2019 at 07:44, Andrew Purtell <apurt...@apache.org> > >>>>> wrote: > >>>>>> > >>>>>>> The fourth HBase 1.5.0 release candidate (RC3) is available for > >>>>> download > >>>>>> at > >>>>>>> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/ and > >>>>> Maven > >>>>>>> artifacts are available in the temporary repository > >>>>>>> > >>>>> > >> https://repository.apache.org/content/repositories/orgapachehbase-1292/ > >>>>>>> > >>>>>>> The git tag corresponding to the candidate is '1.5.0RC3’ > >> (b0bc7225c5). > >>>>>>> > >>>>>>> A detailed source and binary compatibility report for this release > is > >>>>>>> available for your review at > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >> > https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/compat-check-report.html > >>>>>>> . > >>>>>>> > >>>>>>> A list of the 115 issues resolved in this release can be found at > >>>>>>> https://s.apache.org/K4Wk . The 1.5.0 changelog is derived from > the > >>>>>>> changelog of the last branch-1.4 release, 1.4.9. > >>>>>>> > >>>>>>> Please try out the candidate and vote +1/0/-1. > >>>>>>> > >>>>>>> The vote will be open for at least 72 hours. Unless objection I > will > >>>>> try > >>>>>> to > >>>>>>> close it Friday April 12, 2019 if we have sufficient votes. > >>>>>>> > >>>>>>> Prior to making this announcement I made the following preflight > >>>>> checks: > >>>>>>> > >>>>>>> RAT check passes (7u80) > >>>>>>> Unit test suite passes (7u80, 8u181)* > >>>>>>> Opened the UI in a browser, poked around > >>>>>>> LTT load 100M rows with 100% verification and 20% updates (8u181) > >>>>>>> ITBLL 1B rows with slowDeterministic monkey (8u181) > >>>>>>> ITBLL 1B rows with serverKilling monkey (8u181) > >>>>>>> > >>>>>>> There are known flaky tests. See HBASE-21904 and HBASE-21905. These > >>>>> flaky > >>>>>>> tests do not represent serious test failures that would prevent a > >>>>>> release. > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Best regards, > >>>>>>> Andrew > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Best regards, > >>>>> Andrew > >>>>> > >>>>> Words like orphans lost among the crosstalk, meaning torn from > truth's > >>>>> decrepit hands > >>>>> - A23, Crosstalk > >>>>> > >>>> > >> >