Peter, do have a look at IntegrationTestRegionReplicaReplication.java .. At the top of the file, the ways to specify the options are documented .. You need to add something like -DIntegrationTestRegionReplicaReplication.read_delay_ms .. ________________________________________ From: Josh Elser <[email protected]> Sent: Thursday, June 15, 2017 10:40 AM To: [email protected] Subject: Re: Problem with IntegrationTestRegionReplicaReplication
I'd start trying a read_delay_ms=60000, region_replication=2, num_keys_per_server=5000, num_regions_per_server=5 with a maybe 10's of reader and writer threads. Again, this can be quite dependent on the kind of hardware you have. You'll definitely have to tweak ;) On 6/15/17 4:44 AM, Peter Somogyi wrote: > Thanks Josh and Devaraj! > > I will try to increase the timeouts. Devaraj, could you share the > parameters you used for this test which worked? > > On Thu, Jun 15, 2017 at 6:44 AM, Devaraj Das <[email protected]> wrote: > >> That sounds about right, Josh. Peter, in our internal testing we have seen >> this test failing and increasing timeouts (look at the test code options to >> do with increasing timeout) helped quite some. >> ________________________________________ >> From: Josh Elser <[email protected]> >> Sent: Wednesday, June 14, 2017 3:17 PM >> To: [email protected] >> Subject: Re: Problem with IntegrationTestRegionReplicaReplication >> >> On 6/14/17 3:53 AM, Peter Somogyi wrote: >>> Hi, >>> >>> As one of my first task with HBase I started to look into >>> why IntegrationTestRegionReplicaReplication fails. I would like to get >> some >>> suggestions from you. >>> >>> I noticed when I run the test using normal cluster or minicluster I get >> the >>> same error messages: "Error checking data for key [null], no data >>> returned". I looked into the code and here are my conclusions. >>> >>> There are multiple threads writing data parallel which are read by >> multiple >>> reader threads simultaneously. Each writer gets a portion of the keys to >>> write (e.g. 0-2000) and these keys are added to a ConstantDelayQueue. >>> The reader threads get the elements (e.g. key=1000) from the queue and >>> these reader threads assume that all the keys up to this are already in >> the >>> database. Since we're using multiple writers it can happen that another >>> thread has not yet written key=500 and verifying these keys will cause >> the >>> test failure. >>> >>> Do you think my assumption is correct? >> >> Hi Peter, >> >> No, as my memory serves, this is not correct. Readers are not made aware >> of keys to verify until the write occur plus some delay. The delay is >> used to provide enough time for the internal region replication to take >> effect. >> >> So: primary-write, pause, [region replication happens in background], >> add updated key to read queue, reader gets key from queue verifies the >> value on a replica. >> >> The primary should always have seen the new value for a key. If the test >> is showing that a replica does not see the result, it's either a timing >> issue (you need to give a larger delay for HBase to perform the region >> replication) or a bug in the region replication framework itself. That >> said, if you can show that you are seeing what you describe, that sounds >> like the test framework itself is broken :) >> >> >> >> >
