On Thu, 2013-06-27 at 18:23 +0200, Petr Spacek wrote:
> On 21.6.2013 16:19, Simo Sorce wrote:
> > On Thu, 2013-06-20 at 14:30 +0200, Petr Spacek wrote:
> >> On 23.5.2013 16:32, Simo Sorce wrote:
> >>> On Thu, 2013-05-23 at 14:35 +0200, Petr Spacek wrote:
> >>>> It looks that we agree on nearly all points (I apologize if
> >>>> overlooked
> >>>> something). I will prepare a design document for transition to RBTDB
> >>>> and then
> >>>> another design document for DNSSEC implementation.
> >>
> >> The current version of the design is available at:
> >> https://fedorahosted.org/bind-dyndb-ldap/wiki/BIND9/Design/RBTDB
> >
> > Great write-up, thanks.
> >
> >> There are several questions inside (search for text "Question", it should 
> >> find
> >> all of them). I would like to get your opinion about the problems.
> >>
> >> Note that 389 DS team decided to implement RFC 4533 (syncrepl), so 
> >> persistent
> >> search is definitely obsolete and we can do synchronization in some clever 
> >> way.
> >
> >
> > Answering inline here after quoting the questions for the doc:
> >
> >          > Periodical re-synchronization
> >          >
> >          > Questions
> >
> >                * Do we still need periodical re-synchronization if 389 DS
> >                  team implements RFC 4533 (syncrepl)? It wasn't
> >                  considered in the initial design.
> >
> > We probably do. We have to be especially careful of the case when a
> > replica is re-initialized. We should either automatically detect that
> > this is happening or change ipa-replica-manage to kick named some how.
> >
> > We also need a tool or maybe a special attribute in LDAP that is
> > monitored so that we can tell  bind-dyndb-ldap to do a full rebuild of
> > the cache on demand. This way admins can force a rebuild if they end up
> > noticing something wrong.
> Is it acceptable to let admin to delete files & restart named manually? I 
> don't wont to overcomplicate things at the beginning ...

Sure, probably fine, we can have a tool that simply just does that for
starters, and later on we can make it do more complex things if needed.

> >                * What about dynamic updates during re-synchronization?
> >
> > Should we return a temporary error ? Or maybe just queue up the change
> > and apply it right after the resync operation has finished ?
> Unfortunately, the only reasonable error code is SERVFAIL. It is completely 
> up 
> to client if it tries to do update again or not.
> 
> I personally don't like queuing of updates because it confuses clients: 
> Update 
> is accepted by server but the client still can see an old value (for limited 
> period of time).

Another option is to mark fields so that they are not updated with older
values, and just allow the thing to succeed.

> >                * How to get sorted list of entries from LDAP? Use LDAP
> >                  server-side sorting? Do we have necessary indices?
> >
> > We can do client side sorting as well I guess, I do not have a strong
> > opinion here. The main reason why you need ordering is to detect delete
> > records right ?
> Exactly. I realized that server-side sorting doesn't make sense because we 
> plan to use syncrepl, so there is nothing to sort - only the flow of 
> incremental updates.

Syncrepl includes notice of deletions too, right ?

>  > Is thee a way to mark rdtdb records as updated instead
> > (with a generation number) and then do a second pass on the rbtdb tree
> > and remove any record that was not updated with the generation number ?
> There is no 'generation' number, but we can extend the auxiliary database 
> (i.e. database with UUID=>DNS name mapping) with generation number. We will 
> get UUID along with each update from LDAP, so we can simply use UUID for 
> database lookup.
> 
> Then we can go though the UUID database and delete all records which don't 
> have generation == expected_value.

Yes, something like this should work.

> > This would also allow us to keep accepting dynamic updates by simply
> > marking records as generation+1 so that the resync will not overwrite
> > records that are updated during the resync phase.
> I agree. The simplest variant can solve the basic case where 1 update was 
> received during re-synchronization.
> 
> Proposed (simple) solution:
> 1) At the beginning of re-synchronization, set curr_gen = prev_gen+1
> 2) For each entry in LDAP do (via syncrepl):
> - Only if entry['gen'] <  curr_gen:
> --  Overwrite data in local RBTDB with data from LDAP
> --  Overwrite entry['gen'] = curr_gen
> - Else: Do nothing
> 
> In parallel:
> 1) Update request received from a client
> 2) Write new data to LDAP (syncrepl should cope with this)
> 3) Read UUID from LDAP (via RFC 4527 controls)
> 4) Write curr_gen to UUID database
> 5) Write data to local RBTDB
> 6) Reply 'update accepted' to the client
> 
> Crash at any time should not hurt: Curr_gen will be incremented on restart 
> and 
> re-sychronization will be restarted.

Yep.

> The worst case is that update will be stored in LDAP but client will not get 
> reply because of crash (i.e. client times out).

Not a big deal. This can always happen for clients, as the network
connection might be severed after the reply is sent and before is
received. So clients must always be prepared for this event.

> There is a drawback: Two or more successive updates to a single entry can 
> create race condition, as described at 
> https://fedorahosted.org/bind-dyndb-ldap/wiki/BIND9/Design/RBTDB#Raceconditions1
>  .
> 
> The reason is that generation number is not incremented each time, but only 
> overwritten with current global value (i.e. old + 1).
> 
> 
> I don't like the other option with incrementing generation number. It could 
> create nasty corner cases during re-synchronization and handling updates made 
> directly in LDAP/by other DNS server.
> 
> It is not nice, but I think that we can live with it. The important fact is 
> that consistency will be (eventually) re-established.

Yes, I think it is a corner case we can live with.

> >          > (Filesystem) cache maintenance
> >
> >          > Questions: How often should we save the cache from operating
> >          memory to disk?
> >
> > Prerequisite to be able to evaluate this question. How expensive is it
> > to save the cache ?
> My test zone contains 65535 AAAA records, 255 A records, 1 SOA + 1 NS record.
> 
> Benchmark results:
> zone dump   < 0.5 s (to text file)
> zone load   < 1 s (from text file)
> zone delete < 9 s (LOL. This is caused by implementation details of RBTDB.)
> 
> LDAP search on the whole sub-tree: < 15 s

Ouch, this looks very slow, missing indexes ?)
Is this just the search? or is it search + zone load ?

> Load time for bind-dyndb-ldap 3.x: < 120 s

So, a reload from scratch can take many 10s of seconds on big zones, did
this test include DNSSEC signing ? Or would we need to add that on top ?

>  > Is DNS responsive during the save or does the
> > operation block updates or other functionality ?
> AFAIK it should not affect anything. Internal transaction mechanism should 
> handle all these situations and allow queries/updates to proceed.
> 
> >                * On shutdown only?
> >
> > NACK, you are left with very stale data on crashes.
> >
> >                * On start-up (after initial synchronization) and on
> >                  shutdown?
> >
> > It makes sense to dump right after a big synchronization if it doesn't
> > add substantial operational issues. Otherwise maybe a short interval
> > after synchronization.
> >
> >                * Periodically? How often? At the end of periodical
> >                  re-synchronization?
> >
> > Periodically is probably a good idea, if I understand it correctly it
> > means that it will make it possible to substantially reduce the load on
> > startup as we will have less data to fetch from a syncrepl requiest.
> We probably misunderstood each other. I thought that re-synchronization will 
> trigger full re-load from LDAP, so the whole sub-tree will be transferred on 
> each re-synchronization. (I.e. syncrepl will be started again without the 
> 'cookie'.)

No, this was my understanding as well.

> For example:
> time|event
> 0:00 BIND start, changes from the last known state requested
> 0:02 changes were applied to local copy - consistency should be restored
> 0:05 incremental update from LDAP came in
> 0:55 DNS dynamic update came in, local copy & LDAP were updated
> 0:55 incremental update from LDAP came in (i.e. the update from previous line)
> 1:05 incremental update from LDAP came in
> 4:05 incremental update from LDAP came in
> 8:00 full reload is started (by timer)
> 8:05 full reload is finished (all potential inconsistencies were corrected)
> 9:35 incremental update from LDAP came in
> ...
> 
> It is pretty demanding game. That is the reason why I asked if we want to do 
> re-synchronizations automatically...

Right.

> Originally, I planed to write a script which would compare data in LDAP with 
> zone file on disk. This script could be used for debugging & automated 
> testing, so we can assess if the code behaves correctly and decide if we want 
> to implement automatic re-synchronization when necessary.

Wouldn't this script be subject to races depending at what time it is
accessing either LDAP or the file ?

The main issue here is that it is hard to know when doing a full re-sync
is necessary. And because it is expensive I am wary of doing it
automatically too often.

However perhaps a timed event so it is done once a day it is not a bad
idea.

> In all cases, the admin can simply delete files on disk and restart BIND - 
> everything will be downloaded from LDAP again.

Right, we should wrap this knowledge into a tool that does it for the
admin like we did with the sss_cache tool for sssd caches.

> >                * Each N updates?
> >
> > I prefer a combination of each N updates but with time limits to avoid
> > doing it too often.
> > Ie something like every 1000 changes but not more often than every 30
> > minutes and not less often than 8 hours. (Numbers completely made up and
> > need to be tuned based on the answer about the prerequisites question
> > above).
> Sounds reasonable.
> 
> >                * If N % of the database was changed? (pspacek's favorite)
> >
> > The problem with using % database is that for very small zones you risk
> > getting stuff saved too often, as changing a few records quickly makes
> > the % big compared to the zone size. For example a zone with 50 records
> > has a 10% change after just 5 records are changed. Conversely a big zone
> > requires a huge amount of changes before the % of changes builds up
> > leading potentially to dumping the database too infrequently. Example,
> > zone with 100000 records, means you have to get 10000 changes before you
> > come to the 10% mark. If dyndns updates are disabled this means the zone
> > may never get saved for weeks or months.
> > A small zone will also syncrepl quickly so it would be useless to save
> > it often while a big zone is better if it is up to date on disk so the
> > syncrepl operation will cost less on startup.
> >
> > Finally N % is also hard to compute. What do you consider into it ?
> > Only total number of record changed ? Or do you factor in also if the
> > same record is changed multiple times ?
> > Consider fringe cases, zone with 1000 entries where only 1 entry is
> > changed 2000 times in a short period (malfunctioning client (or attack)
> > sending lots of updates for their record.
> 
> I will add another option:
> * After each re-synchronization (including start-up) + on shutdown.
> 
> This is my favourite, but it is dependent on re-synchronization intervals. It 
> could be combined with 'each N updates + time limits' described above.
> 
> 
> > Additional questions:
> >
> > I see you mention:
> > "Cache non-existing records, i.e. do not repeat LDAP search for each
> > query"
> >
> > I assume this is fine and we rely on syncrepl to give us an update and
> > override the negative cache if the record that has been negatively
> > cached suddenly appears via replication through another master, right ?
> Yes. The point is that there will not be any 'cache', but authoritative copy 
> of the DNS sub-tree in LDAP. Hit or miss in the 'local copy' will be 
> authoritative.

Ok.

> > If we rely on syncrepl, are we going to ever make direct LDAP searches
> > at all ? Or do we rely fully on having it send us any changes and
> > therefore we always reply directly from the rbtdb database ?
> Basically yes, we don't need to do any search. We will use separate 
> connections only LDAP modifications (DNS dynamic updates).

Ok.

> The only 'search-like' operation (except syncrepl) will be Read Entry 
> Controls 
> after modification (RFC 4527). This allows us to read UUID of newly created 
> entry in LDAP without an additional search.

Makes sense.

Thanks a lot this plan looks good to me.
Ship it! :-)

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York

_______________________________________________
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Reply via email to