I have found that if I load a Large Binary out of the DB and xdmp:document-insert() it to a different uri in the DB, the Large Data usage for the DB doesn't changes and the Large directory in the forest dir doesn't change either. However, if I load the binary from off the file system then the Large data usage grows with the size of the Large Binary files being document-inserted. This leads me to believe that there is some optimization going on that Large Binaries that originate from the the DB just have pointers to some master record of the binary data itself. So if the file is in ten places in the DB, even with different filenames, MarkLogic has pointers in all of them back to a single binary file.
When I delete or change the name of one of these binaries it doesn't seem to affect the others who have the same "parent." This seems to be very useful, except for when I'm trying to generate a lot of Large Binary data for testing, only to find out that it's all linked in under the covers and not a very good test set. Hence, I have been loading the seed files off the filesystem to prevent a linkage in order to generate a large test set from a small set of binaries. Is my understanding mostly correct? -Rayn
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general