Stefan Zoerner wrote:

Hi all!

Hey sorry for taking so long to respond.

Here is the whole story:
I faced the problem that the compare operation does not adhere the matching rules. Therefore I successfully modified the CompareHandler class in org.apache.ldap.server.protocol to do this (whether this is the best place to fix this problem is not the question here).

Ok some theory behind these constructs might shed some light on what role they serve in the server. Most LDAP servers have a means to extend the schema however this means is extremely limited when it comes to defining new Syntaxes or new MatchingRules. Really these constructs are often built into the server and cannot be changed without code changes.

When I started designing the schema subsystem of ApacheDS (still not finished) I wanted her to be able to be extended for new Syntaxes and new MatchingRules. To do this I had to understand the fundamental components needed to represent new matchingRules and syntaxes. For syntaxes I created an interface called SyntaxChecker. Every syntax must have a SyntaxChecker in order for the schema subsystem to check for proper attribute value syntax. This SyntaxChecker can be a simple regex or an entire parser. As long as the interface is adhired to the schema subsystem can use it to determine if correct values are being used for attributeTypes based on a schema.

The other half dealing with Comparators and Normalizers is much more complex and for this you must really understand what a matchingRule does. The server uses matching rules to determine equality and ordering. Before it can do this string prep must be run on some values (normalization) to remove the chance for varience to enter the picture. Hence matchingRules can be broken down into Comparators and Normalizers. Some may think a Normalizer is syntax specific however how you want to match effects normalization not the syntax. For example if I have an attribute that is a simple string and I want to perform a case insensitive match then the normalization changes from a case sensitive match. This shows how normalization is specific to matching an not just a syntax.

Anyways Normalizers and Comparators are the basis to matchingRules. A new matchingRule must have these defined for its OID as you probably saw.

It worked better, but not all matching rules satisfied my needs (some are missing).

Yep we have not filled in any of these really. Just some very critical ones so the directory can operate. We need help in filling these in.

One of these is telephoneNumberMatch, and I changed SystemComparatorProducer to replace ComparableComparator with something, that implements the missing matching rule.

Cool.  This is exactly what we need to do.

Two options here to implement this Comparator:
1. just implement this interface Comparator, call it TelephoneNumberComparator 2. Create a Normalizer for telephone numbers (removing white space and hyphens, transform to e.g. lower case), and instantiate a NormalizingComparator in SystemComparatorProducer which uses it

Right these would be the two steps to follow. One for the Comparator and another for the normalizer.

This leads me (finally) to the question, where normalizers are intended to use. I do not want my telephone number get "normalized" before storing it, because that would delete the formatting, which people might like to preserve.

Good question.  Let me try to answer this ...

Normalization is critical while attempting to match two values together. Sometimes there is extra white space and it can be removed to better enable correct comparisons. Sometimes normalization is not even needed if the syntax is very rigid without any room for case or space variance. Consider matching for cn=Stefan Zoerner which is in the directory (this is what the user who added an entry put as the cn attribute value). Now another user that is searching for these entries may ask for cn=STEFAN ZOERNER with 3 spaces between STEFAN and ZOERNER. The two users may be the same or different users. The second user should be able to to pull the same entries regardles of which filter he uses below:

(cn=STEFAN   ZOERNER)
(cn= Stefan ZOerner)
(cn=stefan                    zoerner)

So a normalizer would come into play here by generating a canonical representation of these inputs. ApacheDS by default case normalizes by reducing case to lowercase and then comparing the filter string with the normalized attribute value stored within the directory: this is only done for matching rules that ignore case. For whitespace normalization ApacheDS tries to follow the string prep operation defined in various ietf documents. However I'm sure we fall short. The general rule of thumb for ApacheDS is to whitespace normalize while retaining string tokenization order. Meaning we do a deep trim of values replacing whitespace with a single space character. Whitespace on the ends are discarded. This btw is only done when space and whitespace in general is not escaped.

Hope this helps,
Alex

Reply via email to