Hi Kristian! Thank you for the review remarks. I have committed a patch addressing your comments.
http://lists.askmonty.org/pipermail/commits/2014-October/006775.html See inline for my response on specific comments. On Mon, Sep 15, 2014 at 9:25 AM, Kristian Nielsen <[email protected]> wrote: > Nirbhay Choubey <[email protected]> writes: > > > ------------------------------------------------------------ > > revno: 3879 > > revision-id: [email protected] > > parent: [email protected] > > committer: Nirbhay Choubey <[email protected]> > > branch nick: maria-10.0-galera_6593 > > timestamp: Fri 2014-09-12 14:16:22 -0400 > > message: > > MDEV-6593 : domain_id based replication filters > > > > Added support for IGNORE_DOMAIN_IDS and DO_DOMAIN_IDS. > > Here is my review. > > Some general comments, followed by some detailed comments in-line in the > patch: > > 1. Did you prepare any documentation? You could either add it to the > knowledgebase directly, or write up a text file and ask Daniel to add it in > relevant places. > [NC] I haven't but I will prepare it and pass it down to Daniel. > > 2. Did you run the full test suite? I would have expected at least some > tests > in the multi_source suite to need modification. The chance touches enough > things that you should test it in a feature tree in Buildbot to ensure > that it > is green before pushing it to a main tree. > [NC] You are right. I did that now and updated the result files. > 3. I think some extra test cases are needed: > > - You should test (and document) what happens when the same domain_id is > in > both the DO and the IGNORE list. Is it actually ever sensible to have > both > DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS ? If not, I think it should be an > error > to try. > [NC] You are right, setting both DO_ and IGNORE_ at the same time would not make any sense. But I actually followed the pattern of the existing replication, where both _do_ and _ignore_ filters are allowed to hold values at the same time and _do_ is preferred in case both are set. > - You should also add a test using non-GTID mode. > [NC] Done. > > - You should also add a test where parallel replication is used > (--slave-parallel-threads > 0). > [NC] Done. > > - Add a test for the following scenario when using GTID mode: Some domain > id > is ignored, some events are received from the master and ignored. Then > the > slave is stopped, and the ignore rule is removed. What happens when the > slave is restarted and reconnects? I think the previously ignored > domain id > events will now be re-fetched and replicated. I guess that's ok, but it > should be documented. Also test what happens when binlog files on the > master have been rotated and purged, so that the events are no longer > available on the master (an error should occur). > [NC] As the filtering takes place in the IO thread. I have added some test cases covering various scenarios including ones where IO thread crashes while reading COMMIT/XID event. In this scenario if the filtering is ON the GTID is added to the ignored list (ign_gtids) on GTID event, so when the crash happens before COMMIT/XID event, that particular GTID gets ignored after the slave reconnects. > - Add a similar test for when not using GTID mode, the last event > replicated > by the slave is ignored, the slave is stopped, the ignore rule removed > and > the slave restarted; will the last event now be re-fetched or not (I > think > not). > [NC] Done. > > - Add a test that checks that seconds_behind_master is properly updated to > zero even when the last event is ignored due to > DO_DOMAIN_IDS/IGNORE_DOMAIN_IDS. It can perhaps be a bit tricky to test, > but I think you can use this existing test as a basis: > mysql-test/suite/rpl/t/rpl_parallel2.test > [NC] Done. > > - It would also be good to check that the code works when used on an old > master (which does not send GTID events). You can do this by creating a > binlog file with a 5.5 server and storing it in mysql-test/std_data/, > and > then installing it in a master; there should be several other test cases > that already do this. > [NC] Done. > > 4. Also see detailed comments for some possible problems with the > implementation. The most serious is probably to ensure that events are not > skipped after the end of the group, we need a couple of tests for this, see > below. Also, there is a small memory leak, see below. > > > +DO_DOMAIN_IDS (before) : 1 > > +IGNORE_DOMAIN_IDS (before) : 2 > > +CHANGE MASTER TO DO_DOMAIN_IDS=(4,4,5,1,7,7,7,1,1,2,6,8,1,4,5,5,9,3), > MASTER_USE_GTID=current_pos; > > So if a domain_id is specified multiple times, the redundant ones will be > ignored. > [NC] Right. Also, there was a bug in the current IGNORE_SERVER_IDS implementation. Its fixed now. > It would seem preferable to give an error... but as far as I can see the > behaviour is the same for IGNORE_SERVER_IDS, so better to be consistent as > you > did in the code. Just be sure to add this to the documentation. > [NC] Ok. > > > > === added file 'mysql-test/suite/rpl/t/rpl_domain_id_filter.test' > > --- a/mysql-test/suite/rpl/t/rpl_domain_id_filter.test 1970-01-01 > 00:00:00 +0000 > > +++ b/mysql-test/suite/rpl/t/rpl_domain_id_filter.test 2014-09-12 > 18:16:22 +0000 > > > +SET @gtid_domain_id_saved = @@session.gtid_domain_id; > > This is not needed, nor do you use it anywhere, you can remove it. > [NC] was a leftover, removed. > > > +##### Case 3: When both DO_DOMAIN_IDS and IGNORE_DOMAIN_IDS are > non-empty > > + > > +source include/stop_slave.inc; > > +let $do_domain_ids_before= query_get_value(SHOW SLAVE STATUS, > Replicate_Do_Domain_Ids, 1); > > +let $ignore_domain_ids_before= query_get_value(SHOW SLAVE STATUS, > Replicate_Ignore_Domain_Ids, 1); > > +--echo DO_DOMAIN_IDS (before) : $do_domain_ids_before > > +--echo IGNORE_DOMAIN_IDS (before) : $ignore_domain_ids_before > > + > > +# Replicate events belonging to "domain_id 1". (DO_DOMAIN_IDS should > win!) > > +CHANGE MASTER TO DO_DOMAIN_IDS=(1), IGNORE_DOMAIN_IDS=(2), > MASTER_USE_GTID=current_pos; > > Did you intend here to test overlapping ids in DO_DOMAIN_IDS and > IGNORE_DOMAIN_IDS (due to the comment "DO_DOMAIN_IDS should win") ? > [NC] No, its about the DO_ and IGNORE_ both being non-empty. In that case DO_ should be preferred. > But if I understand correctly, IGNORE_DOMAIN_IDS will be ignored if > DO_DOMAIN_IDS is non-empty. [NC] Right. > So I think it should be an error to try to set > both - do you agree? > [NC] Following the current pattern, I think it should DO_ should be preferred. But I can change it fail with error if you would insist. > > > +--source include/rpl_end.inc > > Please add additional testcases, as suggested at the start of review. > > > > === modified file 'sql/rpl_mi.cc' > > --- a/sql/rpl_mi.cc 2014-02-19 10:05:15 +0000 > > +++ b/sql/rpl_mi.cc 2014-09-12 18:16:22 +0000 > > @@ -513,6 +513,25 @@ > > else > > mi->using_gtid= Master_info::USE_GTID_NO; > > } > > + > > + // DO_DOMAIN_IDS > > + if (0 == strncmp(buf, STRING_WITH_LEN("do_domain_ids=")) && > > + mi->domain_id_filter.init_ids(buf + > sizeof("do_domain_ids"), > > + > Domain_id_filter::DO_DOMAIN_IDS)) > > + { > > + sql_print_error("Failed to initialize master info > do_domain_ids"); > > + goto errwithmsg; > > + } > > + > > + // IGNORE_DOMAIN_IDS > > + if (0 == strncmp(buf, STRING_WITH_LEN("ignore_domain_ids=")) > && > > + mi->domain_id_filter.init_ids(buf + > sizeof("ignore_domain_ids"), > > + > Domain_id_filter::IGNORE_DOMAIN_IDS)) > > + { > > + sql_print_error("Failed to initialize master info " > > + "ignore_domain_ids"); > > + goto errwithmsg; > > + } > > So here you use the same style as adjacent code (that I wrote). However, I > have since learned that the style (0 == strncmp(...)) is not used in the > server. So please change this, as well as my existing code, to use either > strncmp(...) == 0 or !strncmp(), as you prefer. > [NC] Done. > > Another thing from my old code is the sizeof("do_domain_ids"), which highly > confused me for a bit before I realised that while the '=' is not included > in > the sizeof(), the terminating '\0' _is_ included. So since even I myself is > confused by this code, better clarify it ;-) Perhaps it would be clearer if > changed to (sizeof("do_domain_ids=")-1) (both in my old code and in your > new > code). > > (Normally it's good to follow the style of existing code as you did, I just > noticed this in the review and realised that my own "existing code" would > benefit from these small improvements). > [NC] I did a bit of refactoring. Let me know if it looks ok. > > > @@ -634,6 +653,16 @@ > > } > > } > > > > + char* do_domain_ids_buf= > > + mi->domain_id_filter.as_string(Domain_id_filter::DO_DOMAIN_IDS); > > + if (do_domain_ids_buf == NULL) > > + DBUG_RETURN(1); > > + > > + char* ignore_domain_ids_buf= > > + mi->domain_id_filter.as_string(Domain_id_filter::IGNORE_DOMAIN_IDS); > > + if (ignore_domain_ids_buf == NULL) > > + DBUG_RETURN(1); > > You have a memory leak here, do_domain_ids_buf will not be freed if > ignore_domain_ids_buf fails due to out-of-memory. > [NC] Fixed. > > > @@ -1348,4 +1382,218 @@ > > DBUG_RETURN(result); > > } > > > > +/* ctor */ > > +Domain_id_filter::Domain_id_filter() : m_filter(false) > > +{ > > + for (int i= DO_DOMAIN_IDS; i <= IGNORE_DOMAIN_IDS; i ++) > > + { > > + my_init_dynamic_array(&m_domain_ids[i], sizeof(uint32), 16, 16, > MYF(0)); > > + } > > +} > > + > > +/* dtor */ > > +Domain_id_filter::~Domain_id_filter() > > I don't think comments like /* ctor */ and /* dtor */ are useful, better > just > omit them. > [NC] Done. > > > > +/** > > + Update the domain id list of the given type with elements from the > new list. > > + > > + @param new_ids [IN] new list of ids > > + @param type [IN] domain id list type to update > > + @param changed [IN] has the domain id list changed in > the > > + last CHANGE MASTER command? > > + > > + @retval void > > +*/ > > +void Domain_id_filter::update_ids(DYNAMIC_ARRAY *new_ids, > enum_list_type type, > > + bool changed) > > +{ > > + DYNAMIC_ARRAY *ids= &m_domain_ids[type]; > > + > > + if (changed == true) > > + reset_dynamic(ids); > > + > > + /* bsearch requires an ordered list. */ > > + sort_dynamic(new_ids, change_master_domain_id_cmp); > > + > > + for (uint i= 0; i < new_ids->elements; i++) > > + { > > + uint32 domain_id; > > + get_dynamic(new_ids, (void *) &domain_id, i); > > + > > + if (bsearch((const uint32 *) &domain_id, ids->buffer, ids->elements, > > + sizeof(uint32), change_master_domain_id_cmp) == NULL) > > + { > > + insert_dynamic(ids, (uint32*) &domain_id); > > + } > > + } > > + return; > > +} > > Ok, so I understand that this code is modelled after the existing code for > IGNORED_SERVER_IDS. But I found it rather confusing, because it looks like > there is a bug: if changed==false, then the code does not ensure that the > resulting array will still be sorted. > > I guess the bug is not possible to trigger, because the list of stuff to > add > will only be non-empty if changed==true? But that is rather confusing. > Please > simplify the logic; maybe you only need to call this function when > changed==true, so the parameter can be omitted and the array always reset? > [NC] Right, in case of domain_ids the list should only be reset/updated when it has changed. I have remove the extra parameter. > Also, consider if you can rewrite the existing code for IGNORE_SERVER_IDS > so > that the same common code can be used as for IGNORE_DOMAIN_IDS and > DO_DOMAIN_IDS, that would simplify the code and reduce risk of bugs or > inconsistencies. > [NC] Done. > > > > +/** > > + Initialize the given domain id list with the numbers from the > specified > > + buffer. > > + > > + @param buf [IN] buffer > > + @param type [IN] domain id list type > > + > > + @retval false Success > > + true Error > > +*/ > > +bool Domain_id_filter::init_ids(char *buf, enum_list_type type) > > +{ > > + char *token, *last; > > + uint num_items; > > + DYNAMIC_ARRAY *ids= &m_domain_ids[type]; > > + > > + token= strtok_r(buf, " ", &last); > > + if (token == NULL) > > + { > > + return true; /* error */ > > + } > > + > > + num_items= atoi(token); > > + for (uint i= 0; i < num_items; i++) > > + { > > + token= strtok_r(NULL, " ", &last); > > + if (token == NULL) > > + { > > + return true; /* error */ > > + } > > + else > > + { > > + uint32 val= atoi(token); > > + insert_dynamic(ids, (uchar *) &val); > > + } > > + } > > + return false; > > +} > > Again, did you consider using the same code for this and for > IGNORE_SERVER_IDS? (I think you did, because you added a comment to > init_dynarray_intvar_from_file ;-). Maybe consider it again; it is fine to > rewrite/remove the existing code for IGNORE_SERVER_IDS and replace it with > your new code. Again, it would reduce risk for bugs or inconsistencies. > > [NC] True, done. > > > + /* getter */ > > + bool get_filter() { return m_filter; } > > + > > + /* setter */ > > + void set_filter(uint32 domain_id); > > These are not really getter and setters, I think, at least I find those > names > confusing. > > Maybe something like do_filter(uint32 domain_id) and is_group_filtered() ? > And expand the comments, so that they explain that these functions > respectively check a GTID for needing to be filtered, and return whether > the > current group is being filtered. > [NC] Done. > > Also, personally I dislike setter/getter, and would prefer just to access > the > m_filter as a public member. But I suppose that's style and up to you. > [NC] I have kept it private for now to avoid accidental modification. > > > + /* > > + Initialize the given domain id list with the numbers from the > specified > > + buffer. > > + > > + @param buf [IN] buffer > > + @param type [IN] domain id list type > > + > > + @retval false Success > > + true Error > > + */ > > + bool init_ids(char *buf, enum_list_type type); > > Clarify in the comment that the list is initialized from a string of > space-separated numbers, the first of which is the number of following > entries. > [NC] Done. > > > > @@ -182,6 +279,7 @@ > > bool flush_relay_log_cache, > > bool need_lock_relay_log); > > int change_master_server_id_cmp(ulong *id1, ulong *id2); > > +int change_master_domain_id_cmp(uint32 *id1, uint32 *id2); > > I think change_master_domain_id_cmp() is only used in rpl_mi.cc, so should > be > static there, and not refered here in rpl_mi.h > [NC] Done. Also renamed it to change_master_ids_cmp to be used by IGNORE_SERVER_IDS impl as well. > > > > === modified file 'sql/slave.cc' > > --- a/sql/slave.cc 2014-08-12 03:55:41 +0000 > > +++ b/sql/slave.cc 2014-09-12 18:16:22 +0000 > > @@ -1259,6 +1259,7 @@ > > going to ignore (to not log them in the relay log). > > Items being read are supposed to be decimal output of values of a > > type shorter or equal of @c long and separated by the single space. > > + It also used to restore DO_DOMAIN_IDS & IGNORE_DOMAIN_IDS lists. > > This comment suggests you wanted to share code, then later changed your > mind. If you end up keeping the code separate, you still need to delete > this > comment. (And there is a typo: "It also" -> "It is also", but if you > remove it > anyway...). > > [NC] I have modified the patch to use this function now. > > > @@ -5601,6 +5610,10 @@ > > mi->last_queued_gtid= event_gtid; > > mi->last_queued_gtid_standalone= > > (gtid_flag & Gtid_log_event::FL_STANDALONE) != 0; > > + > > + /* Should filter all the subsequent events in the current GTID > group? */ > > + mi->domain_id_filter.set_filter(event_gtid.domain_id); > > So, here we will set m_filter to true, if we want to skip the following > group. > > But I don't see any code that will set it to false at the end of the > group? [NC] With the previous logic m_filter was set/reset only at GTID event. I have now added the logic to reset this flag on COMMIT/XID events too. > That seems a problem, as we could then end up wrongly skipping all kinds of > other events, such as Gtid_list, Incident, ... > [NC] Right. I have added the following events types in the exception list : FORMAT_DESCRIPTION_EVENT, ROTATE_EVENT, GTID_LIST_EVENT, HEARTBEAT_LOG_EVENT and INCIDENT_EVENT. > Also note that it is possible for a GTID group to not be terminated by > COMMIT/XID. This can happen at least if the master crashes in the middle of > writing a group; the slave will receive only the first part of it, > followed by > a master restart Format_description event. That case needs to be handled, > and > you should add a test case for it. > [NC] Added a test to cover this scenario. > > Also it is possible for eg. a master downgrade to introduce following event > groups that do not start with GTID - seems these would also be wrongly > skipped? So we also need a test case for this... > [NC] Right, its a valid case, but I couldn't find a way to repeat this scenario in mtr. Suggestions? > > Hope this helps, > Thanks! -- Nirbhay > > - Kristian. >
_______________________________________________ Mailing list: https://launchpad.net/~maria-developers Post to : [email protected] Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp

