> I think most people who have looked into this would agree with Terry. I > think that if you choose option 1, you will find that your directory > software is designed to return relatively small amounts of data and is > just not efficient at moving large blobs of data like the documents that > you are thinking of storing. You will want to do proof-of-concept > performance testing before committing to this approach to make sure the > delivered system would have adequate response time under load.
We store some BLOBs in LDAP (such as a user's desktop wallpaper). If they are of "reasonable" size it works very well. When I tested (which was some time and versions ago) it was loading/updating the BLOBs that hurt performance and ballooned the logs. I think it works well for items that are read-mostly, I wouldn't but BLOBs in the Dit that are frequently changed. > In option 2 it is true that you will have to maintain two repositories, > and it will be difficult for you keep them consistent. Many kinds of > system bugs and failures will cause an update to be completed on one > repository and not the other. If you choose this approach, be sure to > develop a utility which will check consistency between the two > repositories. Agree. I wonder why you'd want to build a document repository on LDAP at all? I'm a fan of LDAP but it seems, IMO, ill suited for that purpose. > Option 3 attracted a lot of interest in the 90's when database companies > like Informix and Oracle were positioning their DBMS products as the > place to store all of your data, in whatever form. I believe that there > were a number of success stories in that area. There seems to be less > interest now. I gather it is just very difficult to create one DBMS > product that can efficiently support many concurrent updates (as a DBMS > must), many concurrent queries (as a DBMS must) and also serve big blobs > of read-only data (like documents). As an Informix shop I think the loss-of-interest is just because it is now common place and barely worth mentioning. Again, if the BLOBs are read-mostly performance is very good and a modern RDMBS can feed them to a client very efficiently. However you do have to take BLOBs into account in your configuration; Informix (and other) RDBMs allow [and recommend] you create separate partitions (or whatever specific term the RDBMS in question uses) where the BLOBs are stored apart from transactional data. > The first two capabilities add a > lot of system overhead that works against the third capability. On the > plus side, a DBMS will help you a lot in keeping its repository > consistent with the directory repository. It may be expensive though. > I am writing of enterprise-level DBMSs like Oracle, DB2, etc. that I'd recommend DB2, which has a connection unlimited free version, for doing this kind of work if you need a free (as in beer) RDBMS. -- Adam Tauno Williams, Network & Systems Administrator Consultant - http://www.whitemiceconsulting.com Developer - http://www.opengroupware.org
