Zhang Weiwu,

I think most people who have looked into this would agree with Terry. I think that if you choose option 1, you will find that your directory software is designed to return relatively small amounts of data and is just not efficient at moving large blobs of data like the documents that you are thinking of storing. You will want to do proof-of-concept performance testing before committing to this approach to make sure the delivered system would have adequate response time under load.

In option 2 it is true that you will have to maintain two repositories, and it will be difficult for you keep them consistent. Many kinds of system bugs and failures will cause an update to be completed on one repository and not the other. If you choose this approach, be sure to develop a utility which will check consistency between the two repositories. Such a utility will tend to run slowly, just because it has to search both repositories exhaustively. Make sure the utility completes fast enough that you can run it frequently, or after a known system failure, and so detect your consistency problems quickly when they are small, manageable, and not yet noticed by your customers. Keep in mind that if the utility runs while your repositories are in use and being updated, false inconsistencies may appear due to updates that have completed in one repository and not the other; the longer the utility takes to complete, the more "false positives" you will have. If "false positives" are not sorted out quickly, people will lose confidence in the consistency checker output and the false positives will mask real problems. On the plus side, it is probably easiest to design a high-performance product using this option because your documents will be served by software that is designed specifically for moving big chunks of data (like files containing documents) around, and your directory information will be served by software specifically designed for efficient searches.

Option 3 attracted a lot of interest in the 90's when database companies like Informix and Oracle were positioning their DBMS products as the place to store all of your data, in whatever form. I believe that there were a number of success stories in that area. There seems to be less interest now. I gather it is just very difficult to create one DBMS product that can efficiently support many concurrent updates (as a DBMS must), many concurrent queries (as a DBMS must) and also serve big blobs of read-only data (like documents). The first two capabilities add a lot of system overhead that works against the third capability. On the plus side, a DBMS will help you a lot in keeping its repository consistent with the directory repository. It may be expensive though. I am writing of enterprise-level DBMSs like Oracle, DB2, etc. that developed in an update-intensive transaction processing environment. There are other DBMS's like MySQL that grew out of a read-mostly industry environment. I don't know much about them and what I wrote above may not be true of them.

Good luck,

Mark

Terry Gardner wrote:
Best to point to a document server, not store in directory server.

On Jan 13, 2009, at 7:49 PM, Zhang Weiwu wrote:

Hello. In one if the directory we are managing it is desirable to attach
documents to the entries. e.g. attach multiple CVs to an employee entry.

What would be the best practice for such requirement?

  1. Directly attach it to the entry using a binary attribute. The
     downside: file name is lost (because a binary attributes holds
     file content as value but not including the filename). If file
     type is limited to types that contain proper metadata (e.g. TIFF,
     PDF) then we can use the document title inside the document as
     filename.
  2. Maintain a directory on the server file system with the same name
     as the DN of the entry. LDAP client (which is an web application)
     should try to get the files from there through ftp or http.
     Downside: maintain two repository of data;
  3. Set up SQL database holding these data. Downside: same as above.

Currently I am thinking about solution 1 partly because I think limiting
document types to TIFF/PDF is helpful for management reason as well, so
this limitation wouldn't hurt me too much.

However this is my first time trying to offer binary document to users.
How do you recommend?

Thanks  & best regards





Reply via email to