Re: Normalizer vs. Comparator

Alex Karasulu Tue, 06 Sep 2005 12:03:26 -0700

Stefan Zoerner wrote:

Hi all!


Hey sorry for taking so long to respond.

Here is the whole story:
I faced the problem that the compare operation does not adhere thematching rules. Therefore I successfully modified the CompareHandlerclass in org.apache.ldap.server.protocol to do this (whether this isthe best place to fix this problem is not the question here).

Ok some theory behind these constructs might shed some light on whatrole they serve in the server.Most LDAP servers have a means to extend the schema however this meansis extremely limited when it comes to defining new Syntaxes or newMatchingRules. Really these constructs are often built into the serverand cannot be changed without code changes.

When I started designing the schema subsystem of ApacheDS (still notfinished) I wanted her to be able to be extended for new Syntaxes andnew MatchingRules. To do this I had to understand the fundamentalcomponents needed to represent new matchingRules and syntaxes. Forsyntaxes I created an interface called SyntaxChecker. Every syntax musthave a SyntaxChecker in order for the schema subsystem to check forproper attribute value syntax. This SyntaxChecker can be a simple regexor an entire parser. As long as the interface is adhired to the schemasubsystem can use it to determine if correct values are being used forattributeTypes based on a schema.

The other half dealing with Comparators and Normalizers is much morecomplex and for this you must really understand what a matchingRuledoes. The server uses matching rules to determine equality andordering. Before it can do this string prep must be run on some values(normalization) to remove the chance for varience to enter the picture.Hence matchingRules can be broken down into Comparators andNormalizers. Some may think a Normalizer is syntax specific however howyou want to match effects normalization not the syntax. For example ifI have an attribute that is a simple string and I want to perform a caseinsensitive match then the normalization changes from a case sensitivematch. This shows how normalization is specific to matching an not justa syntax.

Anyways Normalizers and Comparators are the basis to matchingRules. Anew matchingRule must have these defined for its OID as you probably saw.

It worked better, but not all matching rules satisfied my needs (someare missing).

Yep we have not filled in any of these really. Just some very criticalones so the directory can operate. We need help in filling these in.

One of these is telephoneNumberMatch, and I changedSystemComparatorProducer to replace ComparableComparator withsomething, that implements the missing matching rule.

Cool.  This is exactly what we need to do.

Two options here to implement this Comparator:
1. just implement this interface Comparator, call itTelephoneNumberComparator2. Create a Normalizer for telephone numbers (removing white space andhyphens, transform to e.g. lower case), and instantiate aNormalizingComparator in SystemComparatorProducer which uses it

Right these would be the two steps to follow. One for the Comparatorand another for the normalizer.

This leads me (finally) to the question, where normalizers areintended to use. I do not want my telephone number get "normalized"before storing it, because that would delete the formatting, whichpeople might like to preserve.


Good question.  Let me try to answer this ...

Normalization is critical while attempting to match two valuestogether. Sometimes there is extra white space and it can be removed tobetter enable correct comparisons. Sometimes normalization is not evenneeded if the syntax is very rigid without any room for case or spacevariance. Consider matching for cn=Stefan Zoerner which is in thedirectory (this is what the user who added an entry put as the cnattribute value). Now another user that is searching for these entriesmay ask for cn=STEFAN ZOERNER with 3 spaces between STEFAN andZOERNER. The two users may be the same or different users. The seconduser should be able to to pull the same entries regardles of whichfilter he uses below:


(cn=STEFAN   ZOERNER)
(cn= Stefan ZOerner)
(cn=stefan                    zoerner)

So a normalizer would come into play here by generating a canonicalrepresentation of these inputs. ApacheDS by default case normalizes byreducing case to lowercase and then comparing the filter string with thenormalized attribute value stored within the directory: this is onlydone for matching rules that ignore case. For whitespace normalizationApacheDS tries to follow the string prep operation defined in variousietf documents. However I'm sure we fall short. The general rule ofthumb for ApacheDS is to whitespace normalize while retaining stringtokenization order. Meaning we do a deep trim of values replacingwhitespace with a single space character. Whitespace on the ends arediscarded. This btw is only done when space and whitespace in generalis not escaped.


Hope this helps,
Alex

Re: Normalizer vs. Comparator

Reply via email to