I have found that if I load a Large Binary out of the DB and 
xdmp:document-insert() it to a different uri in the DB, the Large Data usage 
for the DB doesn't changes and the Large directory in the forest dir doesn't 
change either. However, if I load the binary from off the file system then the 
Large data usage grows with the size of the Large Binary files being 
document-inserted. This leads me to believe that there is some optimization 
going on that Large Binaries that originate from the the DB just have pointers 
to some master record of the binary data itself. So if the file is in ten 
places in the DB, even with different filenames, MarkLogic has pointers in all 
of them back to a single binary file. 

When I delete or change the name of one of these binaries it doesn't seem to 
affect the others who have the same "parent." This seems to be very useful, 
except for when I'm trying to generate a lot of Large Binary data for testing, 
only to find out that it's all linked in under the covers and not a very good 
test set. Hence, I have been loading the seed files off the filesystem to 
prevent a linkage in order to generate a large test set from a small set of 
binaries.

Is my understanding mostly correct?

-Rayn
                                          
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to