[ldap] Re: best practice to attach binary documents to entries?

Mark P. Anderson Tue, 13 Jan 2009 21:33:44 -0800

Zhang Weiwu,

I think most people who have looked into this would agree with Terry. Ithink that if you choose option 1, you will find that your directorysoftware is designed to return relatively small amounts of data and isjust not efficient at moving large blobs of data like the documents thatyou are thinking of storing. You will want to do proof-of-conceptperformance testing before committing to this approach to make sure thedelivered system would have adequate response time under load.

In option 2 it is true that you will have to maintain two repositories,and it will be difficult for you keep them consistent. Many kinds ofsystem bugs and failures will cause an update to be completed on onerepository and not the other. If you choose this approach, be sure todevelop a utility which will check consistency between the tworepositories. Such a utility will tend to run slowly, just because ithas to search both repositories exhaustively. Make sure the utilitycompletes fast enough that you can run it frequently, or after a knownsystem failure, and so detect your consistency problems quickly whenthey are small, manageable, and not yet noticed by your customers. Keepin mind that if the utility runs while your repositories are in use andbeing updated, false inconsistencies may appear due to updates that havecompleted in one repository and not the other; the longer the utilitytakes to complete, the more "false positives" you will have. If "falsepositives" are not sorted out quickly, people will lose confidence inthe consistency checker output and the false positives will mask realproblems. On the plus side, it is probably easiest to design ahigh-performance product using this option because your documents willbe served by software that is designed specifically for moving bigchunks of data (like files containing documents) around, and yourdirectory information will be served by software specifically designedfor efficient searches.

Option 3 attracted a lot of interest in the 90's when database companieslike Informix and Oracle were positioning their DBMS products as theplace to store all of your data, in whatever form. I believe that therewere a number of success stories in that area. There seems to be lessinterest now. I gather it is just very difficult to create one DBMSproduct that can efficiently support many concurrent updates (as a DBMSmust), many concurrent queries (as a DBMS must) and also serve big blobsof read-only data (like documents). The first two capabilities add alot of system overhead that works against the third capability. On theplus side, a DBMS will help you a lot in keeping its repositoryconsistent with the directory repository. It may be expensive though.I am writing of enterprise-level DBMSs like Oracle, DB2, etc. thatdeveloped in an update-intensive transaction processing environment.There are other DBMS's like MySQL that grew out of a read-mostlyindustry environment. I don't know much about them and what I wroteabove may not be true of them.


Good luck,

Mark

Terry Gardner wrote:

Best to point to a document server, not store in directory server.

On Jan 13, 2009, at 7:49 PM, Zhang Weiwu wrote:

Hello. In one if the directory we are managing it is desirable to attach
documents to the entries. e.g. attach multiple CVs to an employee entry.

What would be the best practice for such requirement?

  1. Directly attach it to the entry using a binary attribute. The
     downside: file name is lost (because a binary attributes holds
     file content as value but not including the filename). If file
     type is limited to types that contain proper metadata (e.g. TIFF,
     PDF) then we can use the document title inside the document as
     filename.
  2. Maintain a directory on the server file system with the same name
     as the DN of the entry. LDAP client (which is an web application)
     should try to get the files from there through ftp or http.
     Downside: maintain two repository of data;
  3. Set up SQL database holding these data. Downside: same as above.

Currently I am thinking about solution 1 partly because I think limiting
document types to TIFF/PDF is helpful for management reason as well, so
this limitation wouldn't hurt me too much.

However this is my first time trying to offer binary document to users.
How do you recommend?

Thanks  & best regards

[ldap] Re: best practice to attach binary documents to entries?

Reply via email to