On 21.6.2013 16:19, Simo Sorce wrote:
On Thu, 2013-06-20 at 14:30 +0200, Petr Spacek wrote:
On 23.5.2013 16:32, Simo Sorce wrote:
On Thu, 2013-05-23 at 14:35 +0200, Petr Spacek wrote:
It looks that we agree on nearly all points (I apologize if
something). I will prepare a design document for transition to RBTDB
and then
another design document for DNSSEC implementation.

The current version of the design is available at:

Great write-up, thanks.

There are several questions inside (search for text "Question", it should find
all of them). I would like to get your opinion about the problems.

Note that 389 DS team decided to implement RFC 4533 (syncrepl), so persistent
search is definitely obsolete and we can do synchronization in some clever way.

Answering inline here after quoting the questions for the doc:

         > Periodical re-synchronization
         > Questions

               * Do we still need periodical re-synchronization if 389 DS
                 team implements RFC 4533 (syncrepl)? It wasn't
                 considered in the initial design.

We probably do. We have to be especially careful of the case when a
replica is re-initialized. We should either automatically detect that
this is happening or change ipa-replica-manage to kick named some how.

We also need a tool or maybe a special attribute in LDAP that is
monitored so that we can tell  bind-dyndb-ldap to do a full rebuild of
the cache on demand. This way admins can force a rebuild if they end up
noticing something wrong.
Is it acceptable to let admin to delete files & restart named manually? I don't wont to overcomplicate things at the beginning ...

               * What about dynamic updates during re-synchronization?

Should we return a temporary error ? Or maybe just queue up the change
and apply it right after the resync operation has finished ?
Unfortunately, the only reasonable error code is SERVFAIL. It is completely up to client if it tries to do update again or not.

I personally don't like queuing of updates because it confuses clients: Update is accepted by server but the client still can see an old value (for limited period of time).

               * How to get sorted list of entries from LDAP? Use LDAP
                 server-side sorting? Do we have necessary indices?

We can do client side sorting as well I guess, I do not have a strong
opinion here. The main reason why you need ordering is to detect delete
records right ?
Exactly. I realized that server-side sorting doesn't make sense because we plan to use syncrepl, so there is nothing to sort - only the flow of incremental updates.

> Is thee a way to mark rdtdb records as updated instead
(with a generation number) and then do a second pass on the rbtdb tree
and remove any record that was not updated with the generation number ?
There is no 'generation' number, but we can extend the auxiliary database (i.e. database with UUID=>DNS name mapping) with generation number. We will get UUID along with each update from LDAP, so we can simply use UUID for database lookup.

Then we can go though the UUID database and delete all records which don't have generation == expected_value.

This would also allow us to keep accepting dynamic updates by simply
marking records as generation+1 so that the resync will not overwrite
records that are updated during the resync phase.
I agree. The simplest variant can solve the basic case where 1 update was received during re-synchronization.

Proposed (simple) solution:
1) At the beginning of re-synchronization, set curr_gen = prev_gen+1
2) For each entry in LDAP do (via syncrepl):
- Only if entry['gen'] <  curr_gen:
--  Overwrite data in local RBTDB with data from LDAP
--  Overwrite entry['gen'] = curr_gen
- Else: Do nothing

In parallel:
1) Update request received from a client
2) Write new data to LDAP (syncrepl should cope with this)
3) Read UUID from LDAP (via RFC 4527 controls)
4) Write curr_gen to UUID database
5) Write data to local RBTDB
6) Reply 'update accepted' to the client

Crash at any time should not hurt: Curr_gen will be incremented on restart and re-sychronization will be restarted.

The worst case is that update will be stored in LDAP but client will not get reply because of crash (i.e. client times out).

There is a drawback: Two or more successive updates to a single entry can create race condition, as described at https://fedorahosted.org/bind-dyndb-ldap/wiki/BIND9/Design/RBTDB#Raceconditions1 .

The reason is that generation number is not incremented each time, but only overwritten with current global value (i.e. old + 1).

I don't like the other option with incrementing generation number. It could create nasty corner cases during re-synchronization and handling updates made directly in LDAP/by other DNS server.

It is not nice, but I think that we can live with it. The important fact is that consistency will be (eventually) re-established.

         > (Filesystem) cache maintenance

         > Questions: How often should we save the cache from operating
         memory to disk?

Prerequisite to be able to evaluate this question. How expensive is it
to save the cache ?
My test zone contains 65535 AAAA records, 255 A records, 1 SOA + 1 NS record.

Benchmark results:
zone dump   < 0.5 s (to text file)
zone load   < 1 s (from text file)
zone delete < 9 s (LOL. This is caused by implementation details of RBTDB.)

LDAP search on the whole sub-tree: < 15 s
Load time for bind-dyndb-ldap 3.x: < 120 s

> Is DNS responsive during the save or does the
operation block updates or other functionality ?
AFAIK it should not affect anything. Internal transaction mechanism should handle all these situations and allow queries/updates to proceed.

               * On shutdown only?

NACK, you are left with very stale data on crashes.

               * On start-up (after initial synchronization) and on

It makes sense to dump right after a big synchronization if it doesn't
add substantial operational issues. Otherwise maybe a short interval
after synchronization.

               * Periodically? How often? At the end of periodical

Periodically is probably a good idea, if I understand it correctly it
means that it will make it possible to substantially reduce the load on
startup as we will have less data to fetch from a syncrepl requiest.
We probably misunderstood each other. I thought that re-synchronization will trigger full re-load from LDAP, so the whole sub-tree will be transferred on each re-synchronization. (I.e. syncrepl will be started again without the 'cookie'.)

For example:
0:00 BIND start, changes from the last known state requested
0:02 changes were applied to local copy - consistency should be restored
0:05 incremental update from LDAP came in
0:55 DNS dynamic update came in, local copy & LDAP were updated
0:55 incremental update from LDAP came in (i.e. the update from previous line)
1:05 incremental update from LDAP came in
4:05 incremental update from LDAP came in
8:00 full reload is started (by timer)
8:05 full reload is finished (all potential inconsistencies were corrected)
9:35 incremental update from LDAP came in

It is pretty demanding game. That is the reason why I asked if we want to do re-synchronizations automatically...

Originally, I planed to write a script which would compare data in LDAP with zone file on disk. This script could be used for debugging & automated testing, so we can assess if the code behaves correctly and decide if we want to implement automatic re-synchronization when necessary.

In all cases, the admin can simply delete files on disk and restart BIND - everything will be downloaded from LDAP again.

               * Each N updates?

I prefer a combination of each N updates but with time limits to avoid
doing it too often.
Ie something like every 1000 changes but not more often than every 30
minutes and not less often than 8 hours. (Numbers completely made up and
need to be tuned based on the answer about the prerequisites question
Sounds reasonable.

               * If N % of the database was changed? (pspacek's favorite)

The problem with using % database is that for very small zones you risk
getting stuff saved too often, as changing a few records quickly makes
the % big compared to the zone size. For example a zone with 50 records
has a 10% change after just 5 records are changed. Conversely a big zone
requires a huge amount of changes before the % of changes builds up
leading potentially to dumping the database too infrequently. Example,
zone with 100000 records, means you have to get 10000 changes before you
come to the 10% mark. If dyndns updates are disabled this means the zone
may never get saved for weeks or months.
A small zone will also syncrepl quickly so it would be useless to save
it often while a big zone is better if it is up to date on disk so the
syncrepl operation will cost less on startup.

Finally N % is also hard to compute. What do you consider into it ?
Only total number of record changed ? Or do you factor in also if the
same record is changed multiple times ?
Consider fringe cases, zone with 1000 entries where only 1 entry is
changed 2000 times in a short period (malfunctioning client (or attack)
sending lots of updates for their record.

I will add another option:
* After each re-synchronization (including start-up) + on shutdown.

This is my favourite, but it is dependent on re-synchronization intervals. It could be combined with 'each N updates + time limits' described above.

Additional questions:

I see you mention:
"Cache non-existing records, i.e. do not repeat LDAP search for each

I assume this is fine and we rely on syncrepl to give us an update and
override the negative cache if the record that has been negatively
cached suddenly appears via replication through another master, right ?
Yes. The point is that there will not be any 'cache', but authoritative copy of the DNS sub-tree in LDAP. Hit or miss in the 'local copy' will be authoritative.

If we rely on syncrepl, are we going to ever make direct LDAP searches
at all ? Or do we rely fully on having it send us any changes and
therefore we always reply directly from the rbtdb database ?
Basically yes, we don't need to do any search. We will use separate connections only LDAP modifications (DNS dynamic updates).

The only 'search-like' operation (except syncrepl) will be Read Entry Controls after modification (RFC 4527). This allows us to read UUID of newly created entry in LDAP without an additional search.

Petr^2 Spacek

Freeipa-devel mailing list

Reply via email to