Re: Bulk Loader : some ideas...

Howard Chu Sun, 17 Aug 2014 14:45:12 -0700

Emmanuel Lécharny wrote:

Le 17/08/14 22:05, Howard Chu a écrit :

Emmanuel Lécharny wrote:

Le 17/08/14 17:07, Howard Chu a écrit :

If we encounter an entry later in the LDIF that corresponds to one of
these missing DNs, the search in the RDN index will just return the
entryID we already assigned to it. We then remove the DN from the
missing DN list. The result is that the DB tables and entryIDs are
generated in DN order even if the entries aren't ordered in the LDIF.


The pb with this approach is that you lose the EntryUUID stored in the
LDIF file (typically when you try to bulk load an extract done from a
replica : you want to keep this information).


So create a stub entry with a provisional entryUUID, and overwrite the
stub entry with the real entryUUID if you encounter the real entry
later. Still far cheaper than multiple passes thru the LDIF file.


It works in your case, not ours. Again, we need to order the master
table using the entry's UUID, as we also need to create the RDN index at
the same time. We can't pull one entry after the other and push them
into the master table, creating emtpy entries when we have missing
parents, it's just won't produce an ordered master table (the master
table is a Btree<UUID, Entry>).


Then delete the stub entry and insert the new entry.

Obviously, if we had a side index for
UUID, pointing to offset to entries in the file, that would be a
different story (but we would still have to order the UUID index
seprarately, as a whole).

This is the reason who have two phases.

This sounds broken to me; that means if you try to load an LDIF from someother software that also includes entryUUIDs, but which are not generated inthe order that you use, your master table will be in the wrong order.


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Re: Bulk Loader : some ideas...

Reply via email to