Vincent Massol wrote: > Nice work Sergiu. We should transform this into a jira issue to not > forget it. >
We should vote for it first. > One other idea: store attachments on the file system and not in the DB. > > Thanks > -Vincent > > On Feb 27, 2008, at 3:48 PM, Sergiu Dumitriu wrote: > >> Hi devs, >> >> Last night I checked what happens when uploading a file, and why does >> that action require huge amounts of memory. >> >> So, whenever uploading a file, there are several places where the file >> content is loaded into memory: >> - as an XWikiAttachment as byte[] ~= filesize >> - as an XWikiAttachmentArchive as Base64 encoded string ~= >> 2*4*filesize >> - as hibernate tokens that are sent to the database, clones of the >> XWikiAttachment and XWikiAttachmentArchive data ~= 9*filesize >> - as Cached attachments and attachment archive, clones of the same 2 >> objects ~= 9*filesize >> >> Total: ~27*filesize bytes in memory. >> >> So, out of a 10M file, we get at least 270M of needed memory. >> >> Worse, if this is not the first version of the attachment, then the >> complete attachment history is loaded in memory, so add another >> 24*versionsize*versions of memory needed during upload. >> >> After the upload is done, most of these are cleared, only the cached >> objects will remain in memory. >> >> However, a problem still remains with the cache. It is a LRU cache >> with >> a fixed capacity, so even if the memory is full, the cached >> attachments >> will not be released. >> >> Things we can improve: >> - Make the cache use References. This will allow cached attachments to >> be removed from memory when there's a need for more memory >> - Do a better attachment archive system. I'm not sure it is a good >> idea >> to have diff-based versioning of attachments. In theory, it saves >> space >> when versions are much alike, but it does not really work in practice >> because it does a line-diff, and a base64 encoded string does not have >> newlines. What's more, the space gain would be efficient when there >> are >> many versions, as one version alone takes 4 times more space than a >> binary dump of the content. >> >> Suppose we switch to a "one version per table row" for attachment >> history, with direct binary dump, then the memory needed for uploading >> would be 6*filesize, which is much less. -- Sergiu Dumitriu http://purl.org/net/sergiu/ _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

