That sounds good.
> On Apr 11, 2019, at 9:16 PM, Yu Li <[email protected]> wrote: > > Fine, let's focus on verifying whether it's a real problem rather than > arguing about wording, after all that's not my intention... > > As mentioned, I participated in the 1.4.7 release vote[1] and IIRC I was > using the same env and all tests passed w/o issue, that's where my concern > lies and the main reason I gave a -1 vote. I'm running against 1.4.7 source > on the same now and let's see the result. > > [1] https://www.mail-archive.com/[email protected]/msg51380.html > > Best Regards, > Yu > > > On Fri, 12 Apr 2019 at 12:05, Andrew Purtell <[email protected]> > wrote: > >> I believe the test execution order matters. We run some tests in parallel. >> The ordering of tests is determined by readdir() results and this differs >> from host to host and checkout to checkout. So when you see a repeatable >> group of failures, that’s great. And when someone else doesn’t see those >> same tests fail, or they cannot be reproduced when running by themselves, >> the commonly accepted term of art for this is “flaky”. >> >> >>> On Apr 11, 2019, at 8:52 PM, Yu Li <[email protected]> wrote: >>> >>> Sorry but I'd call it "possible environment related problem" or "some >>> feature may not work well in specific environment", rather than a flaky. >>> >>> Will check against 1.4.7 released source package before opening any JIRA. >>> >>> Best Regards, >>> Yu >>> >>> >>> On Fri, 12 Apr 2019 at 11:37, Andrew Purtell <[email protected]> >>> wrote: >>> >>>> And if they pass in my environment , then what should we call it then. I >>>> have no doubt you are seeing failures. Therefore can you please file >> JIRAs >>>> and attach information that can help identify a fix. Thanks. >>>> >>>>> On Apr 11, 2019, at 8:35 PM, Yu Li <[email protected]> wrote: >>>>> >>>>> I ran the test suite with the -Dsurefire.rerunFailingTestsCount=2 >> option >>>>> and on two different env separately, so it sums up to 6 times stable >>>>> failure for each case, and from my perspective this is not flaky. >>>>> >>>>> IIRC last time when verifying 1.4.7 on the same env no such issue >>>> observed, >>>>> will double check. >>>>> >>>>> Best Regards, >>>>> Yu >>>>> >>>>> >>>>> On Fri, 12 Apr 2019 at 00:07, Andrew Purtell <[email protected] >>> >>>>> wrote: >>>>> >>>>>> There are two failure cases it looks like. And this looks like flakes. >>>>>> >>>>>> The wrong FS assertions are not something I see when I run these tests >>>>>> myself. I am not able to investigate something I can’t reproduce. >> What I >>>>>> suggest is since you can reproduce do a git bisect to find the commit >>>> that >>>>>> introduced the problem. Then we can revert it. As an alternative we >> can >>>>>> open a JIRA, report the problem, temporarily @ignore the test, and >>>>>> continue. This latter option only should be done if we are fairly >>>> confident >>>>>> it is a test only problem. >>>>>> >>>>>> The connect exceptions are interesting. I see these sometimes when the >>>>>> suite is executed, not this particular case, but when the failed test >> is >>>>>> executed by itself it always passes. It is possible some change to >>>> classes >>>>>> related to the minicluster or startup or shutdown timing are the >> cause, >>>> but >>>>>> it is test time flaky behavior. I’m not happy about this but it >> doesn’t >>>>>> actually fail the release because the failure is never repeatable when >>>> the >>>>>> test is run standalone. >>>>>> >>>>>> In general it would be great if some attention was paid to test >>>>>> cleanliness on branch-1. As RM I’m not in a position to insist that >>>>>> everything is perfect or there will never be another 1.x release, >>>> certainly >>>>>> not from branch-1. So, tests which fail repeatedly block a release >> IMHO >>>> but >>>>>> flakes do not. >>>>>> >>>>>> >>>>>>> On Apr 10, 2019, at 11:20 PM, Yu Li <[email protected]> wrote: >>>>>>> >>>>>>> -1 >>>>>>> >>>>>>> Observed many UT failures when checking the source package (tried >>>>>> multiple >>>>>>> rounds on two different environments, MacOs and Linux, got the same >>>>>>> result), including (but not limited to): >>>>>>> >>>>>>> TestBulkload: >>>>>>> >>>>>> >>>> >> shouldBulkLoadSingleFamilyHLog(org.apache.hadoop.hbase.regionserver.TestBulkLoad) >>>>>>> Time elapsed: 0.083 s <<< ERROR! >>>>>>> java.lang.IllegalArgumentException: Wrong FS: >>>>>>> >>>>>> >>>> >> file:/var/folders/t6/vch4nh357f98y1wlq09lbm7h0000gn/T/junit1805329913454564189/junit8020757893576011944/data/default/shouldBulkLoadSingleFamilyHLog/8f4a6b584533de2fd1bf3c398dfaac29, >>>>>>> expected: hdfs://localhost:55938 >>>>>>> at >>>>>>> >>>>>> >>>> >> org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamiliesAndSpecifiedTableName(TestBulkLoad.java:246) >>>>>>> at >>>>>>> >>>>>> >>>> >> org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamilies(TestBulkLoad.java:256) >>>>>>> at >>>>>>> >>>>>> >>>> >> org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldBulkLoadSingleFamilyHLog(TestBulkLoad.java:150) >>>>>>> >>>>>>> TestStoreFile: >>>>>>> >>>>>> >>>> >> testCacheOnWriteEvictOnClose(org.apache.hadoop.hbase.regionserver.TestStoreFile) >>>>>>> Time elapsed: 0.083 s <<< ERROR! >>>>>>> java.net.ConnectException: Call From localhost/127.0.0.1 to >>>>>> localhost:55938 >>>>>>> failed on connection exception: java.net.ConnectException: Connection >>>>>>> refused; For more details see: >>>>>>> http://wiki.apache.org/hadoop/ConnectionRefused >>>>>>> at >>>>>>> >>>>>> >>>> >> org.apache.hadoop.hbase.regionserver.TestStoreFile.writeStoreFile(TestStoreFile.java:1047) >>>>>>> at >>>>>>> >>>>>> >>>> >> org.apache.hadoop.hbase.regionserver.TestStoreFile.testCacheOnWriteEvictOnClose(TestStoreFile.java:908) >>>>>>> >>>>>>> TestHFile: >>>>>>> testEmptyHFile(org.apache.hadoop.hbase.io.hfile.TestHFile) Time >>>> elapsed: >>>>>>> 0.08 s <<< ERROR! >>>>>>> java.net.ConnectException: Call From >>>>>>> z05f06378.sqa.zth.tbsite.net/11.163.183.195 to localhost:35529 >> failed >>>> on >>>>>>> connection exception: java.net.ConnectException: Connection refused; >>>> For >>>>>>> more details see: http://wiki.apache.org/hadoop/ConnectionRefused >>>>>>> at >>>>>>> org.apache.hadoop.hbase.io >>>>>> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90) >>>>>>> Caused by: java.net.ConnectException: Connection refused >>>>>>> at >>>>>>> org.apache.hadoop.hbase.io >>>>>> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90) >>>>>>> >>>>>>> TestBlocksScanned: >>>>>>> >>>>>> >>>> >> testBlocksScannedWithEncoding(org.apache.hadoop.hbase.regionserver.TestBlocksScanned) >>>>>>> Time elapsed: 0.069 s <<< ERROR! >>>>>>> java.lang.IllegalArgumentException: Wrong FS: >>>> hdfs://localhost:35529/tmp/ >>>>>>> >>>>>> >>>> >> hbase-jueding.ly/hbase/data/default/TestBlocksScannedWithEncoding/a4a416cc3060d9820a621c294af0aa08 >>>>>> , >>>>>>> expected: file:/// >>>>>>> at >>>>>>> >>>>>> >>>> >> org.apache.hadoop.hbase.regionserver.TestBlocksScanned._testBlocksScanned(TestBlocksScanned.java:90) >>>>>>> at >>>>>>> >>>>>> >>>> >> org.apache.hadoop.hbase.regionserver.TestBlocksScanned.testBlocksScannedWithEncoding(TestBlocksScanned.java:86) >>>>>>> >>>>>>> And please let me know if any known issue I'm not aware of. Thanks. >>>>>>> >>>>>>> Best Regards, >>>>>>> Yu >>>>>>> >>>>>>> >>>>>>>> On Mon, 8 Apr 2019 at 11:38, Yu Li <[email protected]> wrote: >>>>>>>> >>>>>>>> The performance report LGTM, thanks! (and sorry for the lag due to >>>>>>>> Qingming Festival Holiday here in China) >>>>>>>> >>>>>>>> Still verifying the release, just some quick feedback: observed some >>>>>>>> incompatible changes in compatibility report including >>>>>>>> HBASE-21492/HBASE-21684 and worth a reminder in ReleaseNote. >>>>>>>> >>>>>>>> Irrelative but noticeable: the 1.4.9 release note URL is invalid on >>>>>>>> https://hbase.apache.org/downloads.html >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Yu >>>>>>>> >>>>>>>> >>>>>>>>> On Fri, 5 Apr 2019 at 08:45, Andrew Purtell <[email protected]> >>>>>> wrote: >>>>>>>>> >>>>>>>>> The difference is basically noise per the usual YCSB evaluation. >>>> Small >>>>>>>>> differences in workloads D and F (slightly worse) and workload E >>>>>> (slightly >>>>>>>>> better) that do not indicate serious regression. >>>>>>>>> >>>>>>>>> Linux version 4.14.55-62.37.amzn1.x86_64 >>>>>>>>> c3.8xlarge x 5 >>>>>>>>> OpenJDK Runtime Environment (build 1.8.0_181-shenandoah-b13) >>>>>>>>> -Xms20g -Xmx20g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+UseNUMA >>>>>>>>> -XX:-UseBiasedLocking -XX:+ParallelRefProcEnabled >>>>>>>>> Hadoop 2.9.2 >>>>>>>>> Init: Load 100 M rows and snapshot >>>>>>>>> Run: Delete table, clone and redeploy from snapshot, run 10 M >>>>>> operations >>>>>>>>> Args: -threads 100 -target 50000 >>>>>>>>> Test table: {NAME => 'u', BLOOMFILTER => 'ROW', VERSIONS => '1', >>>>>> IN_MEMORY >>>>>>>>> => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => >>>>>>>>> 'ROW_INDEX_V1', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', >>>>>> MIN_VERSIONS => >>>>>>>>> '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE >> => >>>>>>>>> '0'} >>>>>>>>> >>>>>>>>> >>>>>>>>> YCSB Workload A >>>>>>>>> >>>>>>>>> target 50k/op/s 1.4.9 1.5.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [OVERALL], RunTime(ms) 200592 200583 >>>>>>>>> [OVERALL], Throughput(ops/sec) 49852 49855 >>>>>>>>> [READ], AverageLatency(us) 544 559 >>>>>>>>> [READ], MinLatency(us) 267 292 >>>>>>>>> [READ], MaxLatency(us) 165631 185087 >>>>>>>>> [READ], 95thPercentileLatency(us) 738 742 >>>>>>>>> [READ], 99thPercentileLatency(us), 1877 1961 >>>>>>>>> [UPDATE], AverageLatency(us) 1370 1181 >>>>>>>>> [UPDATE], MinLatency(us) 702 646 >>>>>>>>> [UPDATE], MaxLatency(us) 180735 177279 >>>>>>>>> [UPDATE], 95thPercentileLatency(us) 1943 1652 >>>>>>>>> [UPDATE], 99thPercentileLatency(us) 3257 3085 >>>>>>>>> >>>>>>>>> YCSB Workload B >>>>>>>>> >>>>>>>>> target 50k/op/s 1.4.9 1.5.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [OVERALL], RunTime(ms) 200599 200581 >>>>>>>>> [OVERALL], Throughput(ops/sec) 49850 49855 >>>>>>>>> [READ], AverageLatency(us), 454 471 >>>>>>>>> [READ], MinLatency(us) 203 213 >>>>>>>>> [READ], MaxLatency(us) 183423 174207 >>>>>>>>> [READ], 95thPercentileLatency(us) 563 599 >>>>>>>>> [READ], 99thPercentileLatency(us) 1360 1172 >>>>>>>>> [UPDATE], AverageLatency(us) 1064 1029 >>>>>>>>> [UPDATE], MinLatency(us) 746 726 >>>>>>>>> [UPDATE], MaxLatency(us) 163455 101631 >>>>>>>>> [UPDATE], 95thPercentileLatency(us) 1327 1157 >>>>>>>>> [UPDATE], 99thPercentileLatency(us) 2241 1898 >>>>>>>>> >>>>>>>>> YCSB Workload C >>>>>>>>> >>>>>>>>> target 50k/op/s 1.4.9 1.5.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [OVERALL], RunTime(ms) 200541 200538 >>>>>>>>> [OVERALL], Throughput(ops/sec) 49865 49865 >>>>>>>>> [READ], AverageLatency(us) 332 327 >>>>>>>>> [READ], MinLatency(us) 175 179 >>>>>>>>> [READ], MaxLatency(us) 210559 170367 >>>>>>>>> [READ], 95thPercentileLatency(us) 410 396 >>>>>>>>> [READ], 99thPercentileLatency(us) 871 892 >>>>>>>>> >>>>>>>>> YCSB Workload D >>>>>>>>> >>>>>>>>> target 50k/op/s 1.4.9 1.5.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [OVERALL], RunTime(ms) 200579 200562 >>>>>>>>> [OVERALL], Throughput(ops/sec) 49855 49859 >>>>>>>>> [READ], AverageLatency(us) 487 547 >>>>>>>>> [READ], MinLatency(us) 210 214 >>>>>>>>> [READ], MaxLatency(us) 192255 177535 >>>>>>>>> [READ], 95thPercentileLatency(us) 973 1529 >>>>>>>>> [READ], 99thPercentileLatency(us) 1836 2683 >>>>>>>>> [INSERT], AverageLatency(us) 1239 1152 >>>>>>>>> [INSERT], MinLatency(us) 807 788 >>>>>>>>> [INSERT], MaxLatency(us) 184575 148735 >>>>>>>>> [INSERT], 95thPercentileLatency(us) 1496 1243 >>>>>>>>> [INSERT], 99thPercentileLatency(us) 2965 2495 >>>>>>>>> >>>>>>>>> YCSB Workload E >>>>>>>>> >>>>>>>>> target 10k/op/s 1.4.9 1.5.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [OVERALL], RunTime(ms) 100605 100568 >>>>>>>>> [OVERALL], Throughput(ops/sec) 9939 9943 >>>>>>>>> [SCAN], AverageLatency(us) 3548 2687 >>>>>>>>> [SCAN], MinLatency(us) 696 678 >>>>>>>>> [SCAN], MaxLatency(us) 1059839 238463 >>>>>>>>> [SCAN], 95thPercentileLatency(us) 8327 6791 >>>>>>>>> [SCAN], 99thPercentileLatency(us) 17647 14415 >>>>>>>>> [INSERT], AverageLatency(us) 2688 1555 >>>>>>>>> [INSERT], MinLatency(us) 887 815 >>>>>>>>> [INSERT], MaxLatency(us) 173311 154623 >>>>>>>>> [INSERT], 95thPercentileLatency(us) 4455 2571 >>>>>>>>> [INSERT], 99thPercentileLatency(us) 9303 5375 >>>>>>>>> >>>>>>>>> YCSB Workload F >>>>>>>>> >>>>>>>>> target 50k/op/s 1.4.9 1.5.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [OVERALL], RunTime(ms) 200562 204178 >>>>>>>>> [OVERALL], Throughput(ops/sec) 49859 48976 >>>>>>>>> [READ], AverageLatency(us) 856 1137 >>>>>>>>> [READ], MinLatency(us) 262 257 >>>>>>>>> [READ], MaxLatency(us) 205567 222335 >>>>>>>>> [READ], 95thPercentileLatency(us) 2365 3475 >>>>>>>>> [READ], 99thPercentileLatency(us) 3099 4143 >>>>>>>>> [READ-MODIFY-WRITE], AverageLatency(us) 2559 2917 >>>>>>>>> [READ-MODIFY-WRITE], MinLatency(us) 1100 1034 >>>>>>>>> [READ-MODIFY-WRITE], MaxLatency(us) 208767 204799 >>>>>>>>> [READ-MODIFY-WRITE], 95thPercentileLatency(us) 5747 7627 >>>>>>>>> [READ-MODIFY-WRITE], 99thPercentileLatency(us) 7203 8919 >>>>>>>>> [UPDATE], AverageLatency(us) 1700 1777 >>>>>>>>> [UPDATE], MinLatency(us) 737 687 >>>>>>>>> [UPDATE], MaxLatency(us) 97983 94271 >>>>>>>>> [UPDATE], 95thPercentileLatency(us) 3377 4147 >>>>>>>>> [UPDATE], 99thPercentileLatency(us) 4147 4831 >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Thu, Apr 4, 2019 at 1:14 AM Yu Li <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Thanks for the efforts boss. >>>>>>>>>> >>>>>>>>>> Since it's a new minor release, do we have performance comparison >>>>>> report >>>>>>>>>> with 1.4.9 as we did when releasing 1.4.0? If so, any reference? >>>> Many >>>>>>>>>> thanks! >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Yu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, 4 Apr 2019 at 07:44, Andrew Purtell <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> The fourth HBase 1.5.0 release candidate (RC3) is available for >>>>>>>>> download >>>>>>>>>> at >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/ and >>>>>>>>> Maven >>>>>>>>>>> artifacts are available in the temporary repository >>>>>>>>>>> >>>>>>>>> >>>>>> >> https://repository.apache.org/content/repositories/orgapachehbase-1292/ >>>>>>>>>>> >>>>>>>>>>> The git tag corresponding to the candidate is '1.5.0RC3’ >>>>>> (b0bc7225c5). >>>>>>>>>>> >>>>>>>>>>> A detailed source and binary compatibility report for this >> release >>>> is >>>>>>>>>>> available for your review at >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>> >>>> >> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/compat-check-report.html >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> A list of the 115 issues resolved in this release can be found at >>>>>>>>>>> https://s.apache.org/K4Wk . The 1.5.0 changelog is derived from >>>> the >>>>>>>>>>> changelog of the last branch-1.4 release, 1.4.9. >>>>>>>>>>> >>>>>>>>>>> Please try out the candidate and vote +1/0/-1. >>>>>>>>>>> >>>>>>>>>>> The vote will be open for at least 72 hours. Unless objection I >>>> will >>>>>>>>> try >>>>>>>>>> to >>>>>>>>>>> close it Friday April 12, 2019 if we have sufficient votes. >>>>>>>>>>> >>>>>>>>>>> Prior to making this announcement I made the following preflight >>>>>>>>> checks: >>>>>>>>>>> >>>>>>>>>>> RAT check passes (7u80) >>>>>>>>>>> Unit test suite passes (7u80, 8u181)* >>>>>>>>>>> Opened the UI in a browser, poked around >>>>>>>>>>> LTT load 100M rows with 100% verification and 20% updates >> (8u181) >>>>>>>>>>> ITBLL 1B rows with slowDeterministic monkey (8u181) >>>>>>>>>>> ITBLL 1B rows with serverKilling monkey (8u181) >>>>>>>>>>> >>>>>>>>>>> There are known flaky tests. See HBASE-21904 and HBASE-21905. >> These >>>>>>>>> flaky >>>>>>>>>>> tests do not represent serious test failures that would prevent a >>>>>>>>>> release. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best regards, >>>>>>>>>>> Andrew >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards, >>>>>>>>> Andrew >>>>>>>>> >>>>>>>>> Words like orphans lost among the crosstalk, meaning torn from >>>> truth's >>>>>>>>> decrepit hands >>>>>>>>> - A23, Crosstalk >>>>>>>>> >>>>>>>> >>>>>> >>>> >>
