Zhang Weiwu,
I think most people who have looked into this would agree with Terry. I
think that if you choose option 1, you will find that your directory
software is designed to return relatively small amounts of data and is
just not efficient at moving large blobs of data like the documents that
you are thinking of storing. You will want to do proof-of-concept
performance testing before committing to this approach to make sure the
delivered system would have adequate response time under load.
In option 2 it is true that you will have to maintain two repositories,
and it will be difficult for you keep them consistent. Many kinds of
system bugs and failures will cause an update to be completed on one
repository and not the other. If you choose this approach, be sure to
develop a utility which will check consistency between the two
repositories. Such a utility will tend to run slowly, just because it
has to search both repositories exhaustively. Make sure the utility
completes fast enough that you can run it frequently, or after a known
system failure, and so detect your consistency problems quickly when
they are small, manageable, and not yet noticed by your customers. Keep
in mind that if the utility runs while your repositories are in use and
being updated, false inconsistencies may appear due to updates that have
completed in one repository and not the other; the longer the utility
takes to complete, the more "false positives" you will have. If "false
positives" are not sorted out quickly, people will lose confidence in
the consistency checker output and the false positives will mask real
problems. On the plus side, it is probably easiest to design a
high-performance product using this option because your documents will
be served by software that is designed specifically for moving big
chunks of data (like files containing documents) around, and your
directory information will be served by software specifically designed
for efficient searches.
Option 3 attracted a lot of interest in the 90's when database companies
like Informix and Oracle were positioning their DBMS products as the
place to store all of your data, in whatever form. I believe that there
were a number of success stories in that area. There seems to be less
interest now. I gather it is just very difficult to create one DBMS
product that can efficiently support many concurrent updates (as a DBMS
must), many concurrent queries (as a DBMS must) and also serve big blobs
of read-only data (like documents). The first two capabilities add a
lot of system overhead that works against the third capability. On the
plus side, a DBMS will help you a lot in keeping its repository
consistent with the directory repository. It may be expensive though.
I am writing of enterprise-level DBMSs like Oracle, DB2, etc. that
developed in an update-intensive transaction processing environment.
There are other DBMS's like MySQL that grew out of a read-mostly
industry environment. I don't know much about them and what I wrote
above may not be true of them.
Good luck,
Mark
Terry Gardner wrote:
Best to point to a document server, not store in directory server.
On Jan 13, 2009, at 7:49 PM, Zhang Weiwu wrote:
Hello. In one if the directory we are managing it is desirable to attach
documents to the entries. e.g. attach multiple CVs to an employee entry.
What would be the best practice for such requirement?
1. Directly attach it to the entry using a binary attribute. The
downside: file name is lost (because a binary attributes holds
file content as value but not including the filename). If file
type is limited to types that contain proper metadata (e.g. TIFF,
PDF) then we can use the document title inside the document as
filename.
2. Maintain a directory on the server file system with the same name
as the DN of the entry. LDAP client (which is an web application)
should try to get the files from there through ftp or http.
Downside: maintain two repository of data;
3. Set up SQL database holding these data. Downside: same as above.
Currently I am thinking about solution 1 partly because I think limiting
document types to TIFF/PDF is helpful for management reason as well, so
this limitation wouldn't hurt me too much.
However this is my first time trying to offer binary document to users.
How do you recommend?
Thanks & best regards