Nice work Sergiu. We should transform this into a jira issue to not forget it.
One other idea: store attachments on the file system and not in the DB. Thanks -Vincent On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote: > Hi devs, > > Last night I checked what happens when uploading a file, and why does > that action require huge amounts of memory. > > So, whenever uploading a file, there are several places where the file > content is loaded into memory: > - as an XWikiAttachment as byte[] ~= filesize > - as an XWikiAttachmentArchive as Base64 encoded string ~= > 2*4*filesize > - as hibernate tokens that are sent to the database, clones of the > XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize > - as Cached attachments and attachment archive, clones of the same 2 > objects ~= 9*filesize > > Total: ~27*filesize bytes in memory. > > So, out of a 10M file, we get at least 270M of needed memory. > > Worse, if this is not the first version of the attachment, then the > complete attachment history is loaded in memory, so add another > 24*versionsize*versions of memory needed during upload. > > After the upload is done, most of these are cleared, only the cached > objects will remain in memory. > > However, a problem still remains with the cache. It is a LRU cache > with > a fixed capacity, so even if the memory is full, the cached > attachments > will not be released. > > Things we can improve: > - Make the cache use References. This will allow cached attachments to > be removed from memory when there's a need for more memory > - Do a better attachment archive system. I'm not sure it is a good > idea > to have diff-based versioning of attachments. In theory, it saves > space > when versions are much alike, but it does not really work in practice > because it does a line-diff, and a base64 encoded string does not have > newlines. What's more, the space gain would be efficient when there > are > many versions, as one version alone takes 4 times more space than a > binary dump of the content. > > Suppose we switch to a "one version per table row" for attachment > history, with direct binary dump, then the memory needed for uploading > would be 6*filesize, which is much less. > -- > Sergiu Dumitriu > http://purl.org/net/sergiu/ _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

