Re: [ApacheDS] [Schema] New schema subsystem specification

Alex Karasulu Fri, 24 Nov 2006 08:42:43 -0800

Norval Hope wrote:

Sorry this thread is getting so long (I seem to have that effect)...


Anything important and this complex takes effort.

BTW also let me in the immediately mention that several of my decisionsfor this redesign were based on replication requirements.


So there may be some disparity in what we are trying to accomplish.

On 11/24/06, Alex Karasulu <[EMAIL PROTECTED]> wrote:

Norval Hope wrote:
...
>  1. I'd be much happier if the ".schema file => schema partition"
> tool were instead (or also) available as an optional start-up
> mechanism activatable by uncommenting support in server.xml. In the
> use-cases dear to my heart users are able to easily register dynamic
> custom partitions along with the .schema files they depend on by
> simply placing files in various filesystem directories (ala
> appservers) rather then having to run separate utilities.

The utility can also generate an LDIF from .schema files (to add schema
changes) that can be applied once on startup which effectively gives you
what you want right?

Given this
> point I'd most probably do away with a maven plugin for the ".schema
> => schema partition" bit and replace it with code to determine whether
> the .schema partition needed to be populated with bootstrap
> information on its first run after deployment (from .schema files
> included in a release .jar). For dynamic updates/additions of .schema
> files the relevant filesystem directories could be polled for changes
> periodically (again ala appservers).

Yeah there is a problem here with having 2 copies of the same data.
Which one is the authoritative copy?  We'll have the same data in a
.schema file on disk and in the DIT.  Where do we make changes when the
schema is altered via the DIT?  What do we do if the schema files are

changed on disk? What if there are conflicts? How will they beresolved?


My point is that in VD like cases like mine, AD is merely a custodian
of a schema for a custom partition and is in no sense managing it:

Ok this is a good point! And you're right. I agree that AD when actingas a virtual directory needs to simply publish authoritative schemainformation pulled from the target system.

If it does store this information (which is not good) it must beread-only to prevent conflicts.

   a. It should be treated as read-only by AD, there is point in
changing anywhere other then at the target system to which the custom
partition communicates. The authorative source is the target system.
AD is just acting as a pass-through.

Now is the schema expected to change if the data is changed on thetarget system?

   b. For the same reason it doesn't make sense for AD to persist the
schema information in this case, the custom partition may be
explicitly removed while AD is running or its deployment bundle
removed and AD restarted, in which case I'd want all trace of the
schema info to disappear from AD when its associated partition
disappeared.


Hmmm this makes sense as well.

Even in non-VD cases, I imagine the bulk of the schemas currently
imported into AD are best considered static in the sense the end-user
modification of them at runtime could easily destabilise the server.

Oh yeah. This is something we can discuss until the cows come home.Many LDAP servers allow you to change schema information even whenentries exist in the server using those entities that are changed.

This is a very dangerous thing to do because it makes the content andthe server itself unstable. Schema changes that modify or deleteentities are really dangerous. Adds on the other hand are fine howeverand this is generally the way in which schema changes are made.


TOOLING IS THE KEY!!!

NOTE: BUILD THIS FUNCTIONALITY INTO LDAP STUDIO.

The only way to make sure updates, and deletes to schema entities do notmake the directory inconsistent (without slowing down the server) is touse tools to analyze the effects of schema changes on the entity population.

When a schema is governed by an RFC or a spec authored by a third
party, it would seem to be end-user modifications of it (except

perhaps additions) would be generally outlawed.

That makes perfect sense however you still have the ability to changepublished schema in most LDAP servers. IMO this is pretty bad form whenthe proper way to go would be to extend an objectClass or define a newattribute if an existing one does not suite your needs.

The biggest problem with those bastardizing schema is that they don'thave an IANA assigned enterprise number and they think changing existingstandard schema are the best way to cope. This is bad news.


Where such schemas are

used internally by the server, then updating them implies needing to
update the server's code at the same time, no?

Not necessarily. I think you're referring to the extra code elementslike normalizers, syntaxCheckers, and comparators.

An example is best here. If you make a change to an objectClass and anadditional MAY attribute then there is no code change required. In mostcases code changes are *NOT* necessary. Here's an example of a codechange ..

You create a new social security (SS) attribute with it's own syntax.Now you need a syntaxChecker for that new SS syntax to performvalidation. Say you want some nice format for US SS numbers like666-66-6666. Then your syntaxChecker can enforce this.

>  2. Being able to change schema information is a very power-user
> feature, but I'd imagine that a much more common usage is simply
> wanting to add extra read-only schema information (matching various
> RFCs and/or basically static schemas defined by third party vendors)
> after deployment. In my usecases storing the thirdparty (i.e.
> non-core) schema information persistently is actually a minus rather
> then a plus; I'd prefer my users could deploy another custom partition

Another partition?


Here I mean a custom partition authoured by one of my clients. As per
appservers, they deploy a bundle causing AD to exposes a new
partition.


Ok I see.

> with updated schema information and restart AD without having to worry
> about clashes with existing information. Is it theoretically possible
> to indentify various schema subtrees as "read-only" so that they can't
> be changed and aren't persisted, but are instead transiently populated
> from .schema files at start-up?

Might be able to do this but I'm very against the idea of parsing
.schema files on startup.  Plus there are things you cannot store in
.schema files that you can store in the DIT.  Like normalizers,
syntaxCheckers and comparators.


Ok, if you're against reading .schema files (or "schema+ " files that
contain the extra information you mention) then it sounds like I'll
need to keep my support as a custom patch to AD instead.

Well don't give up just yet. We need to figure something out for yourneeds. I'm starting to think we may need a special project just forvirtual directories where the schema subsystem is designed a bitdifferently.


Or we need to add virtualization capabilities into this new schema design.

Don't worry we'll figure something out.

On normalizers, syntaxCheckers etc am I right in thinking that
regardless of syntax of the text file you use you're going to use as
your initial source, there is the problem that ultimately you need to
bind code / behaviour to their definitions: other then name(s) and OID
etc a normalizer is basically the code that implements the

normalization, right?

Yep I was thinking this is byte code in a entry for a normalizer elementin the schema area.


If so then allowing people to add there own ones

(not included in the AD release) is going to involve classloading
issues etc, as well as dealing with textual descriptive file.


Yep.  We're going to need to find a nice way to deal with this.

I apologize if I'm talking crap, just trying to understand these other
objects a bit better.


No you're fine. Don't stress.

>  3. Whether modifying schema information via piece-meal updates or
> whole .schema file imports, we face questions re versioning / draining
> of pending requests referring to old version of the schema etc. Is the
> race condition between schema changes and other operations referring
> to the schema some that needs to be considered now, say by
> synchronizing access to the schema partition.

Schema information under this new design is just like any kind of data
within the server.  The same shared/exclusive lock requirements apply
wrt read/write opertions.


With meta information like schema isn't the problem a bit worse
though? What I'm thinking about is this sort of case (given MINA
worker threads are executing concurrently):
   a. user1 submits modify of attr "a" of object o1 of objectclass c1
(MINA thread 1)
   b. user2 submits delete of attr "a" from schema for c1 (MINA thread 2)
where b. implies a lock on any attempts to change attr "a" in any
instance of c1, and a. implies a lock on changing the schema for c1
(or at least modifying type of /deleting attr "a" anyway).

Good point. Yes the locking can be made more complex like this.However presently many LDAP servers leave such changes as undefined.You're warned not to mess with things like this.

This is not satisfactory if you ask me. However dealing with this topiccould take a long time. Nothing is defined in the protocol. Whateverwe decide to do to manage this situation would have to be custom designed.

So isn't it a bit different because locks need to flow forward to /
back from meta information?

Yes it is different. You have to lock all changes to entries that arethe objectClass of the OC being changed.

> I know my focus is out of whack with AD's primary objectives, in that
> I don't use it as a persistent store at all,

NP.

but even so I see
> populating at start-up rather then maven plugin + import utility

Note that this maven plugin is not for general use.  It is used to
pre-build the schema partition that will be deposited on disk if the
schema partition has not yet been created.


Sure, but it rely on much the same code as the proposed LDIF tool.

Yeah so I guess we could include it in the server but this feels messy.Right now I'm thinking there has to be a better solution to this problem.

Perhaps a partition can provide an method in it's interface that exposesa custom schema associated with the partition which is it's own SAA.Basically the partition can expose access to a schema object that is afacade for accessing various registries. This automatically includesthe partition's schema information in the global registries (it's joined).

The schema can also be registered with the schema subsystem as a virtualschema marked as read only and injected dynamically into the ou=schemaarea. Replication wise there are no issues with this. Basically avirtual schema will not be replicated with physical schema info.

This way partition startup can handle just how this information isobtained (parsed etc) yet the way it is exposed is the same. The servercan then handle this properly.


Need to think more about this idea.

As for the import utility it can just generate an LDIF of that you can
load on startup.  You can provide schemas in LDIF format for your users.
  The good thing with AD is that if you load an LDIF on startup AD marks
that LDIF file as already having been loaded and will not load it again.

It keeps a record of what was loaded when under the ou=system area.


Understood.

My problem is that one of my design goals is to keep work required by
my client custom partition  writers to an absolute miminum. Currently
they deploy a bundle which can optionally include a .schema file and
that's it. I need to maintain that simplicity, so whether its in the
core AD code (looking very unlikely I gather) or via a custom patch to
AD that I maintain, I have to hide any steps required to encorporate a
new schema into the server.

Also the fact the the LDIF is information is persisted and guarded
from reloading is actually a minus in my case, because:
   a) I want to reload to schema information each time, because it is
maintained by author of a custom partition bundle who may have updated
it in line with an updated version of their bundle code
   b) If the schema information for a custom partition is persisted
then I have a problem getting rid of it when AD starts up next time
and this custom partition is no longer deployed.

I planned to deal with the extra info (normalizers etc you mention
above) by looking for code associated with .schema files that defined
the required extra java classes. The need for custom additions to the
existing schema files in this space seems very much a boundary case to
me anyway, these are the stats on such extensions that exist today:

Apache.schema:
   comparators: 3, matching rule: 3, normalizer: 3, syntax checker:
0, syntax producer: 0
NIS.schema:
   comparators: 1, matching rule: 1, normalizer: 1, syntax checker:
2, syntax producer: 2
Inetorgperson:
   comparators: 4, matching rule: 4, normalizer: 4, syntax checker:
0, syntax producer: 0
System:
   comparators: 27, matching rule: 28, normalizer: 27, syntax
checker: 59, syntax producer: 59

where a fair number of the implementations of these various extensions
look like stubs. As I raised earlier in this diatribe, isn't it very
likely that any such extensions required for a thirdparty schema will
require their own custom code?


Again not necessarily.

as a
> universal plus in terms of flexibility / amount of code required.

I think some points I did not make clear.  The schema partition is a
single partition that will always be present just like the system
partition.  You will not be loading schema info into just any partition.
  This partition is dedicated and fixed at ou=system.  Regardless of the
VD you're building you'll still need to have this schema partition or
ApacheDS or your derived virtual directory will not start.

What are some of your requirements for the VD you're working on?

Alex

To try and put in in a nutshell, the requirements on my solution are asfollows:

   1. It must be possible to dynamically register read-only (from
AD's viewpoint) schema information associated with dynamically
registered custom partitions, to facilitate AD acting as a
pass-through container hosting custom partitions acting as adapters
from LDAP to various target systems (where they can be LDAP themselves
(using a different schema), or other technologies)
   2. Such schema information needs to be loaded and readily
upgradable using a simple and commonly used standard representation
(i.e OpenLDAP .schema files), which in rare cases may need to be
augmented with extra code defining and implementing normalizers /
matching rules etc as dictated by the schema in question.
   3. When the last dynamic custom partition requiring a collection
of schema information is deregistered, this schema information should
no longer exposed by AD. Additionally AD should start up with only its
standard schemas loaded, and schemas required for dynamic custom
partitions added lazily as these partitions are accessed and the
schema information becomes necessary.
   4. In short having AD persist the schema information for these
possibly transient dynamic custom partitions in a hiderance rather
then a help.

At any rate, it seems like my requirements are completely disjoint
from what you want to achieve in the schema subsystem redesign.

I already have a solution meeting my requirements by removing the need
for the existing Maven schema plugin, and instead allowing schema
content to be imported to an in-memory representation at start-up.
This solution is only a stepping stone to a more dynamic one, which
required doing away with the BootstrapRegistries stuff amongst other
things.

I can help implement your plan and then rejig my current scheme on top
of the new code, but I can't pretend that I'm not a little
disappointed that there isn't a solution addressing both the core
directory's and my "pass-through" type requirements at the same time.

Ok we can think more about how to make AD bend over backwards to do thisright or we can just create another subproject to deal withvirtualization, synchronization and other things.


Dave asked about this.  Now you have VD needs.  I'm seeing a trend here.

WDYT?

Alex

Re: [ApacheDS] [Schema] New schema subsystem specification

Reply via email to