Fine, let's focus on verifying whether it's a real problem rather than
arguing about wording, after all that's not my intention...

As mentioned, I participated in the 1.4.7 release vote[1] and IIRC I was
using the same env and all tests passed w/o issue, that's where my concern
lies and the main reason I gave a -1 vote. I'm running against 1.4.7 source
on the same now and let's see the result.

[1] https://www.mail-archive.com/[email protected]/msg51380.html

Best Regards,
Yu


On Fri, 12 Apr 2019 at 12:05, Andrew Purtell <[email protected]>
wrote:

> I believe the test execution order matters. We run some tests in parallel.
> The ordering of tests is determined by readdir() results and this differs
> from host to host and checkout to checkout. So when you see a repeatable
> group of failures, that’s great. And when someone else doesn’t see those
> same tests fail, or they cannot be reproduced when running by themselves,
> the commonly accepted term of art for this is “flaky”.
>
>
> > On Apr 11, 2019, at 8:52 PM, Yu Li <[email protected]> wrote:
> >
> > Sorry but I'd call it "possible environment related problem" or "some
> > feature may not work well in specific environment", rather than a flaky.
> >
> > Will check against 1.4.7 released source package before opening any JIRA.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Fri, 12 Apr 2019 at 11:37, Andrew Purtell <[email protected]>
> > wrote:
> >
> >> And if they pass in my environment , then what should we call it then. I
> >> have no doubt you are seeing failures. Therefore can you please file
> JIRAs
> >> and attach information that can help identify a fix. Thanks.
> >>
> >>> On Apr 11, 2019, at 8:35 PM, Yu Li <[email protected]> wrote:
> >>>
> >>> I ran the test suite with the -Dsurefire.rerunFailingTestsCount=2
> option
> >>> and on two different env separately, so it sums up to 6 times stable
> >>> failure for each case, and from my perspective this is not flaky.
> >>>
> >>> IIRC last time when verifying 1.4.7 on the same env no such issue
> >> observed,
> >>> will double check.
> >>>
> >>> Best Regards,
> >>> Yu
> >>>
> >>>
> >>> On Fri, 12 Apr 2019 at 00:07, Andrew Purtell <[email protected]
> >
> >>> wrote:
> >>>
> >>>> There are two failure cases it looks like. And this looks like flakes.
> >>>>
> >>>> The wrong FS assertions are not something I see when I run these tests
> >>>> myself. I am not able to investigate something I can’t reproduce.
> What I
> >>>> suggest is since you can reproduce do a git bisect to find the commit
> >> that
> >>>> introduced the problem. Then we can revert it. As an alternative we
> can
> >>>> open a JIRA, report the problem, temporarily @ignore the test, and
> >>>> continue. This latter option only should be done if we are fairly
> >> confident
> >>>> it is a test only problem.
> >>>>
> >>>> The connect exceptions are interesting. I see these sometimes when the
> >>>> suite is executed, not this particular case, but when the failed test
> is
> >>>> executed by itself it always passes. It is possible some change to
> >> classes
> >>>> related to the minicluster or startup or shutdown timing are the
> cause,
> >> but
> >>>> it is test time flaky behavior. I’m not happy about this but it
> doesn’t
> >>>> actually fail the release because the failure is never repeatable when
> >> the
> >>>> test is run standalone.
> >>>>
> >>>> In general it would be great if some attention was paid to test
> >>>> cleanliness on branch-1. As RM I’m not in a position to insist that
> >>>> everything is perfect or there will never be another 1.x release,
> >> certainly
> >>>> not from branch-1. So, tests which fail repeatedly block a release
> IMHO
> >> but
> >>>> flakes do not.
> >>>>
> >>>>
> >>>>> On Apr 10, 2019, at 11:20 PM, Yu Li <[email protected]> wrote:
> >>>>>
> >>>>> -1
> >>>>>
> >>>>> Observed many UT failures when checking the source package (tried
> >>>> multiple
> >>>>> rounds on two different environments, MacOs and Linux, got the same
> >>>>> result), including (but not limited to):
> >>>>>
> >>>>> TestBulkload:
> >>>>>
> >>>>
> >>
> shouldBulkLoadSingleFamilyHLog(org.apache.hadoop.hbase.regionserver.TestBulkLoad)
> >>>>> Time elapsed: 0.083 s  <<< ERROR!
> >>>>> java.lang.IllegalArgumentException: Wrong FS:
> >>>>>
> >>>>
> >>
> file:/var/folders/t6/vch4nh357f98y1wlq09lbm7h0000gn/T/junit1805329913454564189/junit8020757893576011944/data/default/shouldBulkLoadSingleFamilyHLog/8f4a6b584533de2fd1bf3c398dfaac29,
> >>>>> expected: hdfs://localhost:55938
> >>>>>      at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamiliesAndSpecifiedTableName(TestBulkLoad.java:246)
> >>>>>      at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.hbase.regionserver.TestBulkLoad.testRegionWithFamilies(TestBulkLoad.java:256)
> >>>>>      at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.hbase.regionserver.TestBulkLoad.shouldBulkLoadSingleFamilyHLog(TestBulkLoad.java:150)
> >>>>>
> >>>>> TestStoreFile:
> >>>>>
> >>>>
> >>
> testCacheOnWriteEvictOnClose(org.apache.hadoop.hbase.regionserver.TestStoreFile)
> >>>>> Time elapsed: 0.083 s  <<< ERROR!
> >>>>> java.net.ConnectException: Call From localhost/127.0.0.1 to
> >>>> localhost:55938
> >>>>> failed on connection exception: java.net.ConnectException: Connection
> >>>>> refused; For more details see:
> >>>>> http://wiki.apache.org/hadoop/ConnectionRefused
> >>>>>      at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.hbase.regionserver.TestStoreFile.writeStoreFile(TestStoreFile.java:1047)
> >>>>>      at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.hbase.regionserver.TestStoreFile.testCacheOnWriteEvictOnClose(TestStoreFile.java:908)
> >>>>>
> >>>>> TestHFile:
> >>>>> testEmptyHFile(org.apache.hadoop.hbase.io.hfile.TestHFile)  Time
> >> elapsed:
> >>>>> 0.08 s  <<< ERROR!
> >>>>> java.net.ConnectException: Call From
> >>>>> z05f06378.sqa.zth.tbsite.net/11.163.183.195 to localhost:35529
> failed
> >> on
> >>>>> connection exception: java.net.ConnectException: Connection refused;
> >> For
> >>>>> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.io
> >>>> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90)
> >>>>> Caused by: java.net.ConnectException: Connection refused
> >>>>>      at
> >>>>> org.apache.hadoop.hbase.io
> >>>> .hfile.TestHFile.testEmptyHFile(TestHFile.java:90)
> >>>>>
> >>>>> TestBlocksScanned:
> >>>>>
> >>>>
> >>
> testBlocksScannedWithEncoding(org.apache.hadoop.hbase.regionserver.TestBlocksScanned)
> >>>>> Time elapsed: 0.069 s  <<< ERROR!
> >>>>> java.lang.IllegalArgumentException: Wrong FS:
> >> hdfs://localhost:35529/tmp/
> >>>>>
> >>>>
> >>
> hbase-jueding.ly/hbase/data/default/TestBlocksScannedWithEncoding/a4a416cc3060d9820a621c294af0aa08
> >>>> ,
> >>>>> expected: file:///
> >>>>>      at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.hbase.regionserver.TestBlocksScanned._testBlocksScanned(TestBlocksScanned.java:90)
> >>>>>      at
> >>>>>
> >>>>
> >>
> org.apache.hadoop.hbase.regionserver.TestBlocksScanned.testBlocksScannedWithEncoding(TestBlocksScanned.java:86)
> >>>>>
> >>>>> And please let me know if any known issue I'm not aware of. Thanks.
> >>>>>
> >>>>> Best Regards,
> >>>>> Yu
> >>>>>
> >>>>>
> >>>>>> On Mon, 8 Apr 2019 at 11:38, Yu Li <[email protected]> wrote:
> >>>>>>
> >>>>>> The performance report LGTM, thanks! (and sorry for the lag due to
> >>>>>> Qingming Festival Holiday here in China)
> >>>>>>
> >>>>>> Still verifying the release, just some quick feedback: observed some
> >>>>>> incompatible changes in compatibility report including
> >>>>>> HBASE-21492/HBASE-21684 and worth a reminder in ReleaseNote.
> >>>>>>
> >>>>>> Irrelative but noticeable: the 1.4.9 release note URL is invalid on
> >>>>>> https://hbase.apache.org/downloads.html
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Yu
> >>>>>>
> >>>>>>
> >>>>>>> On Fri, 5 Apr 2019 at 08:45, Andrew Purtell <[email protected]>
> >>>> wrote:
> >>>>>>>
> >>>>>>> The difference is basically noise per the usual YCSB evaluation.
> >> Small
> >>>>>>> differences in workloads D and F (slightly worse) and workload E
> >>>> (slightly
> >>>>>>> better) that do not indicate serious regression.
> >>>>>>>
> >>>>>>> Linux version 4.14.55-62.37.amzn1.x86_64
> >>>>>>> c3.8xlarge x 5
> >>>>>>> OpenJDK Runtime Environment (build 1.8.0_181-shenandoah-b13)
> >>>>>>> -Xms20g -Xmx20g -XX:+UseG1GC -XX:+AlwaysPreTouch -XX:+UseNUMA
> >>>>>>> -XX:-UseBiasedLocking -XX:+ParallelRefProcEnabled
> >>>>>>> Hadoop 2.9.2
> >>>>>>> Init: Load 100 M rows and snapshot
> >>>>>>> Run: Delete table, clone and redeploy from snapshot, run 10 M
> >>>> operations
> >>>>>>> Args: -threads 100 -target 50000
> >>>>>>> Test table: {NAME => 'u', BLOOMFILTER => 'ROW', VERSIONS => '1',
> >>>> IN_MEMORY
> >>>>>>> => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =>
> >>>>>>> 'ROW_INDEX_V1', TTL => 'FOREVER', COMPRESSION => 'SNAPPY',
> >>>> MIN_VERSIONS =>
> >>>>>>> '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE
> =>
> >>>>>>> '0'}
> >>>>>>>
> >>>>>>>
> >>>>>>> YCSB Workload A
> >>>>>>>
> >>>>>>> target 50k/op/s 1.4.9 1.5.0
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> [OVERALL], RunTime(ms) 200592 200583
> >>>>>>> [OVERALL], Throughput(ops/sec) 49852 49855
> >>>>>>> [READ], AverageLatency(us) 544 559
> >>>>>>> [READ], MinLatency(us) 267 292
> >>>>>>> [READ], MaxLatency(us) 165631 185087
> >>>>>>> [READ], 95thPercentileLatency(us) 738 742
> >>>>>>> [READ], 99thPercentileLatency(us), 1877 1961
> >>>>>>> [UPDATE], AverageLatency(us) 1370 1181
> >>>>>>> [UPDATE], MinLatency(us) 702 646
> >>>>>>> [UPDATE], MaxLatency(us) 180735 177279
> >>>>>>> [UPDATE], 95thPercentileLatency(us) 1943 1652
> >>>>>>> [UPDATE], 99thPercentileLatency(us) 3257 3085
> >>>>>>>
> >>>>>>> YCSB Workload B
> >>>>>>>
> >>>>>>> target 50k/op/s 1.4.9 1.5.0
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> [OVERALL], RunTime(ms) 200599 200581
> >>>>>>> [OVERALL], Throughput(ops/sec) 49850 49855
> >>>>>>> [READ], AverageLatency(us),  454 471
> >>>>>>> [READ], MinLatency(us) 203 213
> >>>>>>> [READ], MaxLatency(us) 183423 174207
> >>>>>>> [READ], 95thPercentileLatency(us) 563 599
> >>>>>>> [READ], 99thPercentileLatency(us) 1360 1172
> >>>>>>> [UPDATE], AverageLatency(us) 1064 1029
> >>>>>>> [UPDATE], MinLatency(us) 746 726
> >>>>>>> [UPDATE], MaxLatency(us) 163455 101631
> >>>>>>> [UPDATE], 95thPercentileLatency(us) 1327 1157
> >>>>>>> [UPDATE], 99thPercentileLatency(us) 2241 1898
> >>>>>>>
> >>>>>>> YCSB Workload C
> >>>>>>>
> >>>>>>> target 50k/op/s 1.4.9 1.5.0
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> [OVERALL], RunTime(ms) 200541 200538
> >>>>>>> [OVERALL], Throughput(ops/sec) 49865 49865
> >>>>>>> [READ], AverageLatency(us) 332 327
> >>>>>>> [READ], MinLatency(us) 175 179
> >>>>>>> [READ], MaxLatency(us) 210559 170367
> >>>>>>> [READ], 95thPercentileLatency(us) 410 396
> >>>>>>> [READ], 99thPercentileLatency(us) 871 892
> >>>>>>>
> >>>>>>> YCSB Workload D
> >>>>>>>
> >>>>>>> target 50k/op/s 1.4.9 1.5.0
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> [OVERALL], RunTime(ms) 200579 200562
> >>>>>>> [OVERALL], Throughput(ops/sec) 49855 49859
> >>>>>>> [READ], AverageLatency(us) 487 547
> >>>>>>> [READ], MinLatency(us) 210 214
> >>>>>>> [READ], MaxLatency(us) 192255 177535
> >>>>>>> [READ], 95thPercentileLatency(us) 973 1529
> >>>>>>> [READ], 99thPercentileLatency(us) 1836 2683
> >>>>>>> [INSERT], AverageLatency(us) 1239 1152
> >>>>>>> [INSERT], MinLatency(us) 807 788
> >>>>>>> [INSERT], MaxLatency(us) 184575 148735
> >>>>>>> [INSERT], 95thPercentileLatency(us) 1496 1243
> >>>>>>> [INSERT], 99thPercentileLatency(us) 2965 2495
> >>>>>>>
> >>>>>>> YCSB Workload E
> >>>>>>>
> >>>>>>> target 10k/op/s 1.4.9 1.5.0
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> [OVERALL], RunTime(ms) 100605 100568
> >>>>>>> [OVERALL], Throughput(ops/sec) 9939 9943
> >>>>>>> [SCAN], AverageLatency(us) 3548 2687
> >>>>>>> [SCAN], MinLatency(us) 696 678
> >>>>>>> [SCAN], MaxLatency(us) 1059839 238463
> >>>>>>> [SCAN], 95thPercentileLatency(us) 8327 6791
> >>>>>>> [SCAN], 99thPercentileLatency(us) 17647 14415
> >>>>>>> [INSERT], AverageLatency(us) 2688 1555
> >>>>>>> [INSERT], MinLatency(us) 887 815
> >>>>>>> [INSERT], MaxLatency(us) 173311 154623
> >>>>>>> [INSERT], 95thPercentileLatency(us) 4455 2571
> >>>>>>> [INSERT], 99thPercentileLatency(us) 9303 5375
> >>>>>>>
> >>>>>>> YCSB Workload F
> >>>>>>>
> >>>>>>> target 50k/op/s 1.4.9 1.5.0
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> [OVERALL], RunTime(ms) 200562 204178
> >>>>>>> [OVERALL], Throughput(ops/sec) 49859 48976
> >>>>>>> [READ], AverageLatency(us) 856 1137
> >>>>>>> [READ], MinLatency(us) 262 257
> >>>>>>> [READ], MaxLatency(us) 205567 222335
> >>>>>>> [READ], 95thPercentileLatency(us) 2365 3475
> >>>>>>> [READ], 99thPercentileLatency(us) 3099 4143
> >>>>>>> [READ-MODIFY-WRITE], AverageLatency(us) 2559 2917
> >>>>>>> [READ-MODIFY-WRITE], MinLatency(us) 1100 1034
> >>>>>>> [READ-MODIFY-WRITE], MaxLatency(us) 208767 204799
> >>>>>>> [READ-MODIFY-WRITE], 95thPercentileLatency(us) 5747 7627
> >>>>>>> [READ-MODIFY-WRITE], 99thPercentileLatency(us) 7203 8919
> >>>>>>> [UPDATE], AverageLatency(us) 1700 1777
> >>>>>>> [UPDATE], MinLatency(us) 737 687
> >>>>>>> [UPDATE], MaxLatency(us) 97983 94271
> >>>>>>> [UPDATE], 95thPercentileLatency(us) 3377 4147
> >>>>>>> [UPDATE], 99thPercentileLatency(us) 4147 4831
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Thu, Apr 4, 2019 at 1:14 AM Yu Li <[email protected]> wrote:
> >>>>>>>>
> >>>>>>>> Thanks for the efforts boss.
> >>>>>>>>
> >>>>>>>> Since it's a new minor release, do we have performance comparison
> >>>> report
> >>>>>>>> with 1.4.9 as we did when releasing 1.4.0? If so, any reference?
> >> Many
> >>>>>>>> thanks!
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>> Yu
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, 4 Apr 2019 at 07:44, Andrew Purtell <[email protected]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> The fourth HBase 1.5.0 release candidate (RC3) is available for
> >>>>>>> download
> >>>>>>>> at
> >>>>>>>>> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/ and
> >>>>>>> Maven
> >>>>>>>>> artifacts are available in the temporary repository
> >>>>>>>>>
> >>>>>>>
> >>>>
> https://repository.apache.org/content/repositories/orgapachehbase-1292/
> >>>>>>>>>
> >>>>>>>>> The git tag corresponding to the candidate is '1.5.0RC3’
> >>>> (b0bc7225c5).
> >>>>>>>>>
> >>>>>>>>> A detailed source and binary compatibility report for this
> release
> >> is
> >>>>>>>>> available for your review at
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://dist.apache.org/repos/dist/dev/hbase/hbase-1.5.0RC3/compat-check-report.html
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>>> A list of the 115 issues resolved in this release can be found at
> >>>>>>>>> https://s.apache.org/K4Wk . The 1.5.0 changelog is derived from
> >> the
> >>>>>>>>> changelog of the last branch-1.4 release, 1.4.9.
> >>>>>>>>>
> >>>>>>>>> Please try out the candidate and vote +1/0/-1.
> >>>>>>>>>
> >>>>>>>>> The vote will be open for at least 72 hours. Unless objection I
> >> will
> >>>>>>> try
> >>>>>>>> to
> >>>>>>>>> close it Friday April 12, 2019 if we have sufficient votes.
> >>>>>>>>>
> >>>>>>>>> Prior to making this announcement I made the following preflight
> >>>>>>> checks:
> >>>>>>>>>
> >>>>>>>>>  RAT check passes (7u80)
> >>>>>>>>>  Unit test suite passes (7u80, 8u181)*
> >>>>>>>>>  Opened the UI in a browser, poked around
> >>>>>>>>>  LTT load 100M rows with 100% verification and 20% updates
> (8u181)
> >>>>>>>>>  ITBLL 1B rows with slowDeterministic monkey (8u181)
> >>>>>>>>>  ITBLL 1B rows with serverKilling monkey (8u181)
> >>>>>>>>>
> >>>>>>>>> There are known flaky tests. See HBASE-21904 and HBASE-21905.
> These
> >>>>>>> flaky
> >>>>>>>>> tests do not represent serious test failures that would prevent a
> >>>>>>>> release.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Best regards,
> >>>>>>>>> Andrew
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards,
> >>>>>>> Andrew
> >>>>>>>
> >>>>>>> Words like orphans lost among the crosstalk, meaning torn from
> >> truth's
> >>>>>>> decrepit hands
> >>>>>>> - A23, Crosstalk
> >>>>>>>
> >>>>>>
> >>>>
> >>
>

Reply via email to